Computing Quantiles of Functions of the Agent Distribution Using t-Digests

Kirkby, Robert

doi:10.1007/s10614-023-10472-6

Computing Quantiles of Functions of the Agent Distribution Using t-Digests

Open access
Published: 26 September 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Computational Economics Aims and scope Submit manuscript

Computing Quantiles of Functions of the Agent Distribution Using t-Digests

Download PDF

Robert Kirkby ORCID: orcid.org/0000-0002-8242-0513¹

493 Accesses
Explore all metrics

Abstract

We introduce t-Digests as an accurate and computationally efficient way to calculate the quantiles of functions of the agent distribution for models in which the full distribution is too large to work with directly; e.g., to calculate the Top 1% of the wealth distribution. When it is possible to fit the entire agent distribution (and a function evaluation on it) into memory, the quantiles can easily be calculated directly. Evaluating a function on the agent distribution can be done much faster using a GPU, however this frequently introduces a memory bottleneck as GPU memory is typically an order of magnitude smaller than CPU memory. For Heterogeneous Agent models the full distribution of agents may fit in CPU memory, but not in GPU memory. We partition the agent distribution into (non-overlapping) subspaces and because these subspaces are much smaller we can fit the subspaces in GPU memory one at a time. For each subspace we then evaluate the function and calculate a t-Digest. t-Digests are a form of data structure that are fast to compute, require less memory, and provide high accuracy for the quantiles of the distribution; t-Digests are a ’sketch’ of the subspace. Having computed and stored a t-Digest for each subspace we can then merge them to get a t-Digest for the full agent distribution. The resulting t-Digest is an accurate representation of the quantiles and can form the basis of Lorenz Curves as well as many other distributional and inequality statistics of interest in Economics.

Likelihood-free approximate Gibbs sampling

Article 11 March 2020

Computing Longitudinal Moments for Heterogeneous Agent Models

Article 07 November 2023

Generating Function Methods for Run and Scan Statistics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

How to compute the quantiles of a function evaluated on the agent distribution when we can only fit part of the distribution in memory at one time? We introduce the use of t-Digests for this purpose. t-Digests have two main properties that make them suited to this application: (i) fast to compute while using little memory but are still accurate for quantiles, (ii) can be merged.^{Footnote 1} A t-Digest summarizes a one-dimensional distribution as a sequence of ’centroid means’ and corresponding ’centroid weights’. A basic concept would be to replace the distribution with it’s percentiles, these percentiles would be a sequence of values (the centroid means) and each percentile would have a weight of 1/100. If we know the percentiles of two sub-distributions we can easily imagine merging these to approximately calculate the percentiles of the full distribution. t-Digests are an improvement of this idea, where a scaling function is used that imposes smaller weights near the tails of the distribution (with larger weights near the middle/median). Using scaling functions allows accurate calculation of statistics like the 99th or 99.9th percentile while using few centroids for the t-Digest, at little very cost to the accuracy of the median. Merging t-Digests is computationally easy and fast. The literature on computation and big data has developed t-Digests as as efficient implementation of this concept.

A full introduction and explanation of t-Digests and their properties is provided by Dunning and Ertl (2019), and a brief description of their applications in computing appears in Dunning (2021). Typical uses for t-Digests in computing include calculating the quantiles of a dataset that is distributed across multiple servers (computing a t-Digest for each and then merging them) as well as detecting outliers in a stream of incoming data (compute a t-Digest, then update it periodically on incoming data, an outlier is a shift in, e.g., the 99th percentile). To our knowledge we provide the first use of t-Digests in Economics.

In our practical example we consider a life-cycle model with N permanent types of agent, indexed by $i=1,2,\ldots ,N$. Permanent types of agents might be as simple as a different parameter value, or more generally the agents might, e.g., have different utility functions, or different values of a given parameter, or face different processes for the exogenous shock.

When solving an overlapping-generations (OLG) heterogeneous agent model we will get an agent distribution $\mu (a,z,j,i)$, where a is the endogenous state (vector), z is the exogenous state (vector), j is the period/age, and i is the permanent type. Our interest is in computing distributional statistics, such as quantiles, of some function of the agent distribution; the agent distribution is multi-dimensional but the function itself is scalar-valued, and so once we evaluate the function on the agent distribution we have a one-dimensional distribution. For example, say a is asset holdings and we are interested in computing the quartiles of the asset distribution, or we are interested in the deciles of the savings rate, $(a'-a)/income$, where $a'$ is the policy for next period asset holdings. If the agent distribution $\mu (a,z,j,i)$ fits in memory we can simply load the whole distribution, evaluate the function, and then calculate the quantiles of this distribution. Our interest is in how to proceed when $\mu (a,z,j,i)$ cannot fit in memory at once.

If we were interested in calculating the mean, we could simply loop over $i=1,\ldots ,N$ and for each i we load $\mu _i(a,z,j)$, the agent distribution for a specific permanent type i, into memory and calculate the (conditional) mean of this agent type. Having done this for each agent type we could then simply take a weighted sum of these to get the mean of the whole agent distribution. t-Digests implement this intuition of doing a calculation on each agent type and them merging them together, but in a way that computes the quantiles: We loop over $i=1,\ldots ,N$ and for each i we load $\mu _i(a,z,j)$, the agent distribution for a specific permanent type i, into memory and for $\mu _i(a,z,j)$ we calculate a t-Digest, denoted $tD_i$. Once the loop is completed we have the set of t-Digests, $\{tD_1,\ldots ,tD_N\}$. We then merge these t-Digests to get a new t-Digest tD, and are able to calculate the distribution statistics of interest from tD.^{Footnote 2}

t-Digests are likely to be most useful for using GPUs to solve models with large state-spaces for the agent distribution. For an overview of the heterogeneous agent incomplete markets models where these techniques might be useful, see (Heathcote et al., 2009). For an introduction to the use of GPUs in economics see (Aldrich et al., 2011). Quantiles for all of these models could be computed without t-Digests by using standard CPU memory. But the evaluation of the function on the agent distribution can be made an order of magnitude faster by using the GPU, but this can introduce a memory bottleneck as GPU memory is often an order of magnitude smaller than CPU memory.^{Footnote 3} t-Digests enable us to take advantage of the faster runtimes of the GPU, without running into the memory bottleneck imposed by the smaller size of GPU memory relative to CPU memory. While the use of t-Digests themselves imposes an additional computational cost, this cost is negligible compared to the speed gains of using the GPU to evaluate the function on the agent distribution: in our example in Sect. 5 using the CPU to compute the quantiles takes 526 s, compared to 3.21 s using the GPU even though the later includes the additional steps involved in computing the t-Digests; using the GPU without t-Digests and just directly computing quantiles takes 3.18 s but this would have issues around GPU memory bottlenecks in larger models.

An alternative to using t-Digests to deal with the memory bottleneck imposed by GPUs when dealing with discretized agent distributions would be to use more parametric approximations of the agent distribution. For example (Algan et al., 2008) use polynomials to approximate the agent distribution.^{Footnote 4} Another example is Gouin-Bonenfant and Toda (2023) who combine a finite grid with a Pareto tail to approximate the agent distribution in infinite-horizon models where the agent distribution is known to have a Pareto tail, a class of models of interest in studying top-wealth inequality. These more parametric approaches to the agent distribution will directly reduce the amount of memory required to store the agent distribution.^{Footnote 5}

In our implementation we compute $\mu (a,z,j,i)$ as grid points and their corresponding weights, but there is nothing in our use of t-Digests that requires this; t-Digests could be used with parameterized agent distributions. Nor is our division of the whole agent space into subspaces based on i essential. We could divide based on j or any other dimension (or combination of dimensions). Obviously t-Digests can also be applied to infinite horizon models, and models with aggregate shocks. In our implemented example the different agent permanent types will have the same state-space, but this is not necessary for the application of t-Digests.

While we work with the (discretized) agent distribution directly, an alternative solution technique for heterogeneous agent models is to simulate the agents, which generates a sample of data points and t-Digests could be applied directly to this sample. For example, we might create S simulations in parallel over C cpu cores: for each cpu we could simulate S/C agents and then calculate the t-Digest for these S/C simulations, we could then merge the C different t-Digests created to get the t-Digest for the whole agent distribution, and calculate statistics of interest from this.

Those using Matlab can directly use our functions, createDigest() and mergeDigests(). These functions are provided as part of VFI Toolkit (Kirkby (2022); vfitoolkit.com), but can be used as standalone functions. Users of VFI Toolkit with models containing permanent types of agents will enjoy the advantages of t-Digests without ever having to use touch them directly. Codes implementing the two examples in this paper (Sects. 4 and 5) are available at: github.com/robertdkirkby/tDigestForEcon, which also provides copies of createDigest() and mergeDigests() (duplicates of those in VFI Toolkit).

2 Quantiles of a Function evaluated on the Agent Distribution

We will use t-Digests to calculate the quantiles (or functions thereof) of the agent distribution in heterogeneous agent incomplete market models. This might be something like the distribution of assets, or earnings, or labor supply.^{Footnote 6} We now provide a brief technical definition of this, but many readers may simply skip to the next section.

Let X be the state-space for the agent distribution; so for our life-cycle model $X=A \times {\mathbb {Z}} \times {\mathbb {J}} \times {\mathbb {I}}$. We want to calculate the quantiles Q(p), where $p \in (0,1)$,^{Footnote 7} of the value of a function g(x), $g: X \rightarrow {\mathbb {R}}$, on the agent distribution $\mu (x)$, $\mu : X \rightarrow [0,1]$.

The quantile is defined as,

$$\begin{aligned} Q(p)=\min _{{\bar{x}}} g({\bar{x}}) \text { s.t. } p \le \int _{\{x \in X: g(x) \le g({\bar{x}}) \}} \mu (x) \end{aligned}$$

(1)

Note that in our computational approximations to the model there is no issue around using the minimum and maximum rather than infimum and supremum.

Although the agent distribution itself, $\mu (x)$, is multi-dimensional over (a, z, j, i) once we evaluate the function g(x) on the agent distribution we can collapse to a single-dimensional version of $\mu (x)$ by ordering x based on g(x). In codes this is a simple sort operation.^{Footnote 8} We can then use t-Digests, which only work with a single-dimensional distribution, on this sorted distribution.

3 t-Digest

A t-Digest can be understood as a data structure created by clustering real-valued samples. Unlike standard clustering methods the size of each cluster is here limited by a scaling function. By setting the scaling function to create more clusters in the tails of the distribution t-Digests can acheive high accuracy for top percentiles, like the 99.9th percentile, without having to use many points. This comes at the cost of a minor loss in accuracy in the middle of the distribution for statistics like the median. Each cluster, or bin, is summarized by a centroid value and a weight. This is as opposed to say, bins based on minimum and maximum bounds, and will substantially simplify merging because the bins are not required to be non-overlapping.

To simplify we explain t-Digests based on weighted grids where the observations are in increasing order, and where the sum of the weights is restricted to one.^{Footnote 9}^,^{Footnote 10}

Due to the scaling function only a few samples will be used to construct the bins corresponding to the extreme quantiles and this is why the estimates of these quantiles remain accurate. t-Digests have an error when estimating the q quantile that is nearly constant relative to $q(1-q)$; thus this error small for extreme values of q near zero or one, which is an important distinction between t-Digests and existing alternative data structures for estimating quantiles (Dunning & Ertl, 2019). The scaling function depends on a scaling parameter, $\delta $, with higher values of $\delta $ putting more bins near the tails.^{Footnote 11}

While most applications of t-Digests are based on a sample of data points, in our heterogeneous agent models we are working directly with the discretized agent distribution which is a series of grid points and associated weights. We will therefore describe t-Digests based on this approach. An alternative solution technique for heterogeneous agent models is to simulate data from the model, which would generate a sample of data points to which t-Digests could be applied; we refer the reader to Dunning and Ertl (2019) for explanation of the implementation of t-Digests for a sample of data points.

We begin with an ordered grid of b points, $[g_1,\ldots ,g_b]=G \subset {\mathbb {R}}$, together with their associated weights $[w_1,\ldots ,w_b] \subset [0,1]$, $\sum _{i=1}^{b} w_i =1$. We want to create a t-Digest from this. Consider a partition of this distribution into clusters C with each cluster summarized by two pieces of information: ${\bar{C}}$ the mean of the cluster, and |C| the weight of the cluster. The mean ${\bar{C}}$ is defined as the mean of the grid points in C, and the weight of the cluster is the sum of the corresponding weights. As a trivial example we might divide into two clusters $C_1=[g_1,\ldots ,g_7]$ and $C_2=[g_8,\ldots ,g_b]$. These would have means of ${\bar{C}}_1=\sum _{i=1}^7 g_i/|C_1|$, and ${\bar{C}}_2=\sum _{i=8}^b g_i/|C_2|$, and weights of $|C_1|=\sum _{i=1}^7 w_i$, and $|C_2|=\sum _{i=8}^b w_i$. What is important for t-Digests is we will only need to keep a few numbers: the cluster means and cluster weights, and can drop all information about which points were used to construct them; here we turned 2b pieces of information (b pieces for each of g and w) into just 4 pieces of information (two means and two weights).

Notice that this formulation can be easily used to progressively create any quantiles. Let’s use the concrete example of calculating four clusters each of size 0.25. Start by building the cluster that will correspond to the first quartile, begin with an empty cluster and then one by one add elements of the grid until the sum of the corresponding weights exceeds 0.25 at which point we stop. Next for the second quartile we again begin with an empty cluster, and starting from the point at which we stopped previously we keep adding points until the sum of the corresponding weights exceeds 0.25 at which point we stop. The process can clearly be repeated two more times to create the clusters for the remaining two quartiles. We obviously have the weights for each quartile, and as we went we can simply keep a running track of the cluster mean, updated each time we add a point to the cluster, while discarding all information about the points themselves as we go (both their grid value and their corresponding weight).

What we have so far, if we use m quantiles instead of the four quartiles in our trivial example, is a digest. We implicitly imposed an unnecessary restriction on the clusters, namely in our example that they were all evenly weighted (specifically, as there were four, that they were each of weight 0.25). When building a digest, we do not need to impose even weights, but rather we limit the weight of each cluster consecutively. The remaining piece for a t-Digest is a scale function, so that instead of evenly spaced quantiles we put more clusters near the extremes of the distribution (or equivalently, that the clusters near the extremes have smaller size/weight).

The scale function should be chosen to provide the appropriate trade-off between accurate estimation of the tails of the distribution without weakening accuracy near the median. The scale function will also determine (jointly with the distribution for which we are creating the digest) the number of clusters used. In most applications it is important to keep the number of clusters fairly small, but in our own application to agent distributions it seems appropriate to just use large numbers of clusters, say a few thousand, as the computational costs of the t-Digests are dwarfed by the rest of the heterogeneous agent model.

To limit cluster size, we define the scale function as a non-decreasing function from quantile q to a notional index k with scaling parameter $\delta $. The scaling function is given by,

$$\begin{aligned} k(q;\delta )=\frac{\delta }{2\pi } sin^{-1}(2q-1) \end{aligned}$$

(2)

It is possible to use an alternative scaling function but we have found this one to perform best, see (Dunning & Ertl, 2019) for alternatives (we are using their $k_1$ out of four alternatives). As the scaling function k is non-decreasing, the maximum accuracy near the tails of the distribution is influenced by the end-most clusters as determined by the minimum value $k(0;\delta )=-\delta /4$ and maximum value $k(1;\delta )=\delta /4$. The greater the value of $\delta $ the more clusters will be generated near the tails of the distribution, and also more generally the more clusters will be generated in total. We tried $\delta =10$, 100, 1000, 10000 and 100000; we settled on $\delta =10000$ as the default in our codes. $\delta =10000$ led to up to roughly 5000 clusters in our t-Digests created from the agent distribution.^{Footnote 12} Sect. 4.1 shows how the accuracy of our results in Sect. 4 varies with different values for $\delta $. Our choice of $\delta $ is very conservative, in the sense of large numbers of clusters and high accuracy compared to most applications, which comes at a computational expense but only a very small one and seemed appropriate as the run time for creating and merging t-Digests was tiny compared to the other aspects of heterogeneous agent models. Figure 1 plots the scaling function $k(q;\delta )$ for $\delta =10000$. As can be seen it is much steeper near $q=0$ and $q=1$. As a result we obtain clusers (with smaller weights) near the extremes.

Using the scaling function to control the cluster size, our earlier idea of how to create clusters is implemented as Algorithm 1. It takes an agent distribution (grids points and associated weights) as an input, and creates a t-Digest.

That covers how to create a t-Digest for a specific agent permanent type (or just any subspace of the agent distribution). How do we now merge these t-Digests together? Merging t-Digests can be done in batches. We here just explain how to merge two t-Digests as the generalization to any finite number is trivial.^{Footnote 13}

Say we have two t-Digests from independent samples/subspaces. Each t-Digest is a set of cluster means and corresponding cluster weights. We also need to know the size of each t-Digest, or more accurately of the samples/subspaces they represent. In this example we have a model with two permanent types of agent, and 0.7 of agents are type 1 and 0.3 of agents are type 2. Our first step is to simply multiply all the cluster weights of the t-Digest for agent type 1 by the weight of agent type 1, namely 0.7. We then multiply all the cluster weights of the t-Digest for agent type 2 by the weight of agent type 2, namely 0.3. After reweighting we can join the two t-Digests together, giving us a set of cluster weights and cluster means. We can sort this set by the cluster means, and the resulting ordered set of cluster means and cluster weights is just an ordered set of points and associated weights. We now create a t-Digest from this exactly as if it was any other ordered set of points and associated weights. So the task of merging two t-Digests simply involves taking their weighted union, sorting, and then creating a t-Digest from this. In the pseudocode we use ’relative weight’ to refer to the relative size of the sample/subspace sketched by each t-Digest (we assume that the relative weights sum to one, but t-Digests can be generalized to not require this).

Done. We now have the merged t-Digests. Any quantiles of interest can be calculated directly from this. As can things like the Lorenz curve and Gini coefficient.

There are a number of important concepts to understand why t-Digests are computationally advantageous. The first few are obvious: we are using just a few points (the cluster means and cluster weights), the only calculations involve keeping track of a mean and a weight as we loop over points, and thanks to the scaling function we have more clusters where accuracy is important to us. More subtle are that t-digests are weakly ordered (as opposed to strongly ordered) and that they are fully merged. Let’s start with what strongly ordered means: a digest is strongly ordered if $g_i<g_j$ for all $g_i\in C_i$ and $g_j\in C_j$, for any $i<j$ (points in the ’lower’ cluster are always less than points in the ’higher’ cluster). Algorithms involving bins based on upper and lower limits typically impose strong ordering. Wealkly ordered is just that $g_i+\Delta <g_j$ for all $g_i\in C_i$ and $g_j\in C_j$, for any $i<j$, for some positive offset $\Delta \ge 1$; the individual elements that make up a cluster don’t need to be strictly ordered according to the cluster; although the cluster means will still be ordered. When we create a t-Digest off the original distribution it will be strictly ordered, but once we merge t-Digests they need only be weakly ordered, and this substantially reduces the informational, and hence computational, requirements. This comment is worth repeating: using t-Digests we can accurately estimate quantiles without needing to keep the underlying data strictly ordered! Fully merged refers to the fact that due to the way we construct the t-Digests it is not possible to combine any two of the clusters in our t-Digest into one and still satify the restrictions that we imposed on the weights of the clusters. We might think of fully merged as ensuring we are not ’wasting’ any clusters.

Note that t-Digests do not retain information on the maximum and minimum values, but this could trivially be done alongside the implementation of t-Digests.

4 Trivial Example

We start with a very simple example (the code is provided as tDigest.m). We generate one matrix of uniform [0, 10] random variables, containing $10^6$ observations (so the weight of each observation is $1/10^6$). Exact calculation of the median gives 5.0017, and with t-Digest the median gives 5.0001. Exact calculation of the 99th percentile gives 9.9001, and using t-Digests gives 9.8998.^{Footnote 14} The t-Digest provides accurate estimates for both the median and 99th percentile.

We repeat this with a second sample of $10^7$ observations from a uniform [0, 5] distribution, but rather than giving each of these observations equal weight we instead draw the weights for each point from a uniform [0, 1] distribution. Exact calculation of the median gives 2.4990, and with a t-Digest gives 2.4982. Exact calculation of the 99th percentile gives 4.9498, and using t-Digest gives 4.9496.

We then join these two distributions using relative weights of 0.6 for the first distribution and 0.4 for the second distribution. Exact calculation from the joined sample for the median gives 3.5721, and with a merged t-Digest—calculated merging the two independent t-Digests (not by creating a t-Digest from the joined sample)—of the median gives 3.5692. Exact calculation of the 99th percentile gives 9.8327 and with a t-Digest gives 9.8318.

We create a third distribution of normal, N(0, 9) random variables (with equal weights). Exact calculation of the median gives $-$0.0016, and with a t-Digest gives $-$0.0029. Exact calculation of the 99th percentile 6.9853, and using a t-Digest 6.9812. We join this together with the two previous distributions using relative weights of [0.4, 0.3, 0.3]. Exact calculation from the joined distribution for the median gives 2.5831, and with a merged t-Digest gives 2.5798. Exact calculation of the 99th percentile gives 9.7545, and with a t-Digest gives 9.7531.

We then tried looking at a very different distribution: we use $10^6$ points with equal weights (of $1/(10^6)$). The first 0.3 fraction (so $3*10^5$ points) take a value of 4, the next 0.2 take a value of 5, and the remaining 0.5 take a value of 7. When we use t-Digests to calculate the 29.9th percentile we get 4, the 30.1th percentile we get 5, the 49.9th percentile we get 5, and the 50.1th percentile we get 7. We get the same we when do the exact calculation (and as the distribution is so simple, the fact that these values also represent the theoretically correct results is trivially true).

While these examples are so trivial that using t-Digests is superflous as we can just calculating quantiles direct from the distributions themselves, it shows that the t-Digests are accurate and provides a test of the createDigest() and mergeDigest() commands in Matlab that we provide as part of the contribution of this paper.^{Footnote 15} The example code tDigest.m also doubles as an interactive introduction to t-Digests allowing users to play with different scaling functions and scaling parameter values, or changing the sample sizes or even distributions. Both this example as well as the following example with the agent distribution are using the settings of $\delta =10,000$ with 5,000 clusters as described in the previous section.

4.1 Hyperparameters of the t-Digests

We now repeat the exercises described in Sect. 4 to demonstrate how changing hyperparameter $\delta $ in the t-Digests affects accuracy. Using a larger $\delta $ will mean more clusters and therefore more accuracy. The differences in run times associated with these five values of $\delta $ are measured in hundreths of a second. The memory use in all cases is negligible, being at most few megabytes.

The six exercises here correspond to those described in Sect. 4. We report the results in Table 1. The column ’precise’ is the results of directly calculating the statistics (largely the same statistics as those in Sect. 4, in most exercises these were the median and the 99th percentile). The column corresponding to $\delta =10000$ is our default and relates to the results reported so far. As can be seen the larger delta corresponds to more higher accuracy. We also include the number of clusters, which is controlled by $\delta $.

Table 1 How the hyperparameter $\delta $ influence accuracy

Full size table

5 Example with Agent Distribution

We now solve a simple life-cycle model with five agent types, for which we can calculate the quantiles of the asset distribution both directly and using t-Digests. This example further demonstartes the accuracy of using t-Digests to calculate quantiles of functions evaluated on the agent distribution. We have made this example simple enough that we can calculate quantiles directly without the use of t-Digests. VFI Toolkit, which we use to create this example, can handle cases where this is not true, and it is for such cases that t-Digests are most useful, but we would be unable to directly calculate the quantiles to compare accuracy in those cases so we do not present one here.

We very briefly present the household problem here, with minimal explanation as the model itself is of only indirect interest. A full-description of the model appears in Appendix A. The households problem is,

$$\begin{aligned} V_i(a,z,e,j)= & {} \max _{c,a',h} \frac{c^{1-\sigma }}{1-\sigma } - \psi \frac{h^{1+\eta }}{1+\eta } + (1-s_j)\beta {\mathbb {I}}_{(j>=Jr+10)} warmglow(a') \end{aligned}$$

(3)

$$\begin{aligned}{} & {} \quad \quad \quad \quad \quad + s_j \beta E[V_i(a',z',e',j+1)|z'] \nonumber \\{} & {} \text {if }j<Jr: \; c+a'=(1+r)a+wh \kappa _j \alpha _i z e \end{aligned}$$

(4)

$$\begin{aligned}{} & {} \text {if }j>=Jr: \; c+a'=(1+r)a +pension \end{aligned}$$

(5)

$$\begin{aligned}{} & {} 0\le h \le 1, a'\ge 0 \end{aligned}$$

(6)

$$\begin{aligned}{} & {} log(z')=\rho _{z} log(z) + \epsilon , \; \epsilon \sim N(0,\sigma _{\epsilon ,z}^2) \end{aligned}$$

(7)

$$\begin{aligned}{} & {} e \sim N(1,\sigma _{e}^2) \end{aligned}$$

(8)

where a is assets, c is consumption, h is labor supply. There are two exogenous shocks, z is AR(1) and e is i.i.d. normal. The effective labor units depend on z and e, as well as a deterministic age-depenedent component $\kappa _j$, and a fixed-effect $\alpha _i$. All additional details of the model appear in Appendix A.

The inclusion of the fixed-effect $\alpha _i$, which takes five possible values, is the dimension over which we will use t-Digests. Value function iteration can be done seperately for each value of the fixed-effect, and the agent distribution can be calculated seperately for each value of the fixed effect. We then evaluate functions—specifically four functions: (i) the fraction of time worked (h), (ii) earnings ($w h \kappa _j \alpha _i z e$), (iii) assets (a), and (iv) the savings rate ($(a'-a)/(r a + wh\kappa _j \alpha _i z e)$)— seperately for each value of the fixed effect. We then calculate a t-Digest for each of these, merge the five t-Digests, and calculate quantiles of the merged t-Digest; we report the median and the 99th percentile. For the purposes of comparison we alternatively join the five agents distributions and function evaluations together and directly calculate the median and 99th percentile.^{Footnote 16} The results are shown in Table 2.

Table 2 Accuracy of t-Digests for life-cycle model

Full size table

We view the t-Digests as accurate, although the reader can of course draw their own conclusion. Our main purpose in introducing t-Digests is to allow us to find the quantiles of larger models for which the exact calculation is not possible.^{Footnote 17} This example, for which we can perform the exact calculation, reassures us that using t-Digests does provide accurate results. Would the t-Digests remain accurate for larger models? If we consider the earnings in the model used here, it takes a potential 722,925 different values and attached to each would be a different weight.^{Footnote 18} In a larger model we might expect $10^6$ or $10^7$ values and associated weights. Note that the larger model with $10^7$ values and weights actually then looks a lot like our trivial exercise in Sect. 4 and so the t-Digests would be expected to perform with a similar accuracy as was documented there.

The point of t-Digests is to enable us to use the GPU instead of the CPU. How does this impact runtimes? Using the CPU to compute the quantiles takes 526 s, compared to 3.21 s on the GPU even though the later includes the additional steps involved in computing the t-Digests; using the GPU without t-Digests and just directly computing quantiles takes 3.18 s but this would have issues around GPU memory bottlenecks in larger models.^{Footnote 19}

6 Conclusion

t-Digests provide a powerful technique for summarizing distributional information. The loss in accuracy is negligible for most purposes and the ability to parallelize/subdivide makes t-Digests easy to use and powerful. t-Digests have been implemented in VFI Toolkit as the default method for calculating distributional statistics when using ’permanent types’ of agents. This enables handling substantially larger agent distributions. VFI Toolkit thus takes advantage of both the much larger memory of the CPU (to store the full agent distribution) and the faster speed of function evaluation on a grid using the GPU, as the later can then simply be summarized as a t-Digest. VFI Toolkit uses t-Digests when the full agent distribution will fit in CPU memory but will not fit in GPU memory. t-Digests might also be useful in distributed computation of heterogeneous agent models, e.g. solving each permanent type on a seperate node, as they can reduce the overhead of what needs to be returned (i.e., just the t-Digests).

We hope that others may find t-Digests useful for heterogeneous agent models, especially to study the distributional properties of functions evaluated on the agent distribution. The methods are well developed and understood by the computational literature and their use in Economics is simple to implement.

Notes

A third property, easily serialized/updated, is important in many applications but not used in our present application. Essentially, serialization refers to the ability calculate a t-Digest on an initial sample, and if we then get an additional sample, we can easily update the t-Digest to incorporate the additional sample.
Nothing about this explanation hinges on dividing the agent distribution by permanent type i. Any other complete and non-overlapping division of the agent distribution can be used.
For example, the author has a desktop with 8gb of GDDR and 64gb of DDR, and access to a server with 40gb of GDDR and 400gb of DDR. (GDDR is GPU memory, DDR is CPU memory.)
The method of Algan et al. (2008) is very different from that of using finite grids as they first simulate data to calculate certain moments of the agent distribution, and then fit the polynomials match these moments. By contrast the weights on finite grids are typically constructed directly from the transition matrix.
In principle any of these could be combined with t-Digests if a memory bottleneck was encountered, but since these parametrizations of the agent distribution reduce memory use it seems unlikely that such bottlenecks would be encountered.
A minor extension of our setting would allow for calculating things like the quantiles of the change in earnings which also depends on transitions.
$p=0$ and $p=1$ are just be the min and max of the distribution and so can be defined and computed directly.
More precisely, we first evaluate g() on the space for (a, z, j, i), and then sort the resulting matrix of values of g, we can then use the sorting-index from this to similarly sort the values of $\mu $ on the same space.
Before beginning, we can simply sort the grid if it is not already in increasing order.
Weights that do not sum to one can be handled by simply reweighting things for the t-Digest, but we ignore this here for simplicity and because it does not seem relevant for the application we have in mind. See Dunning and Ertl (2019).
We are not aware of any formal results on the accuracy of the approximation provided by t-Digests, beyond the concepts just described in this paragraph.
$\delta =10$, 100, 1000, 10000 and 100000 gave a number of clusters of 4, 49, 499, 4999 and 49, 999 respectively.
While we do not see much use for serializing t-Digests in heterogeneous agent models we may simply have not seen the use case so we briefly mention how to do so. Note that ’serializing’ t-Digests: creating a first t-Digest, then periodically/repeatedly updating this based on some additional data, looks a lot like repeatedly merging two t-Digests.
For the population the median is 5 and the 99th percentile is 9.9.
You can find them at: github.com/vfitoolkit/VFIToolkit-matlab/tree/master/EvaluateFnOnAgentDist/digest/.
Note that both h and a will have less grid point values than the number of clusters used by the t-Digests.
More accurately, the exact calculation is not possible using the GPU because of the memory bottleneck this creates. The full calculation would be possible, but much slower, using the CPU.
$722,925=51\times 45\times 5\times 21\times 3$, which are the number of different values for h, $\kappa _j$, $\alpha _i$, z and $\epsilon $ respectively ($\kappa _j$ is zero on retirement, so just the 45 working age periods).
We used parallel CPUs with six cores to produce these results. Note that these are the combined run times for calculating the quantiles of the four functions: earnings, hours worked, assets, and the savings rate. The run times we report are the means over repeating this 100 times. These run times start from an already computed agent distribution.
vfitoolkit.com/updates-blog/2021/an-introduction-to-life-cycle-models/.
In a panel data regression $\alpha _i$ would be considered a fixed-effect (in log hourly earnings).

References

Aldrich, E., Fernandez-Villaverde, J., Gallant, R., & Rubio-Ramirez, J. (2011). Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control, 35(3), 386–393.
Article Google Scholar
Algan, Y., Allais, O., & Den Haan, W. (2008). Solving heterogeneous-agent models with parameterized cross-sectional distributions. Journal of Economic Dynamics and Control. https://doi.org/10.1016/j.jedc.2007.03.007
Article Google Scholar
Dunning, T. (2021). The t-digest: Efficient estimates of distributions. Software Impacts. https://doi.org/10.1016/j.simpa.2020.100049
Article Google Scholar
Dunning, T., & Ertl O. (2019). Computing extremely accurate quantiles using t-digests. https://doi.org/10.48550/arXiv.1902.04023. arXiv:1902.04023
Gouin-Bonenfant, E., & Toda, A. A. (2023). Pareto extrapolation: An analytical framework for studying tail inequality. Quantitative Economics. https://doi.org/10.3982/QE1817
Article Google Scholar
Heathcote, J., Storesletten, K., & Violante, G. (2009). Quantitative macroeconomics with heterogeneous households. Annual Review of Economics, 1(5), 319–354.
Article Google Scholar
Kirkby, R. (2022). VFI toolkit, v2. Zenodo. https://doi.org/10.5281/zenodo.8136790
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Victoria University of Wellington, Wellington, New Zealand
Robert Kirkby

Authors

Robert Kirkby
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Kirkby.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

The Life-Cycle Model

We use a standard life-cycle model, which we take directly from ’An Introduction to Life-Cycle Models’^{Footnote 20} using the VFI Toolkit (vfitoolkit.com).

In this model agents live for 81 periods indexed by model-age $j=1,\ldots ,81$ (representing ages 20 to 100 years). Agents face consumption-savings and consumption-leisure choices, until age 65 ($Jr=46=65-19$) when they retire and face only consumption-savings choices. a is assets, c is consumption, and h is the fraction of time worked (labor supply). When working, agents earnings are $wh \kappa _j \alpha _i z e$, where $\kappa _j$ is a deterministic age-dependent amount, $\alpha _i$ is a permanent effect,^{Footnote 21} (log) z is an AR(1) exogenous process, and (log) e is an i.i.d. exogenous process.

The other features of the model are that there is a conditional probability of survival, $s_j$ which depends on age. Agents receive warm-glow of bequests (utility) from leaving behind assets upon dying (as long as they are aged 75+). $\beta $ is the discount rate at which future utility is discounted. When retired they receive a pension (lump-sum transfer). Assets earn an interest rate r, and there is a wage w (per unit of time worked per labor productivity unit). There is a borrowing constraint on assets ($aprime\ge 0$).

The households problem is,

$$\begin{aligned} V_i(a,z,e,j)= & {} \max _{c,aprime,h} \frac{c^{1-\sigma }}{1-\sigma } - \psi \frac{h^{1+\eta }}{1+\eta } + (1-s_j)\beta {\mathbb {I}}_{(j>=Jr+10)} warmglow(aprime) \end{aligned}$$

(9)

$$\begin{aligned}{} & {} \quad \quad \quad \quad \quad + s_j \beta E[V_i(aprime,zprime,eprime,j+1)|zprime] \nonumber \\{} & {} \text {if }j<Jr: \; c+aprime=(1+r)a+wh \kappa _j \alpha _i z e \end{aligned}$$

(10)

$$\begin{aligned}{} & {} \text {if }j>=Jr: \; c+aprime=(1+r)a +pension \end{aligned}$$

(11)

$$\begin{aligned}{} & {} 0\le h \le 1, aprime\ge 0 \end{aligned}$$

(12)

$$\begin{aligned}{} & {} log(zprime)=\rho _{z} log(z) + \epsilon , \; \epsilon \sim N(0,\sigma _{\epsilon ,z}^2) \end{aligned}$$

(13)

$$\begin{aligned}{} & {} log(e) \sim N(0,\sigma _{e}^2) \end{aligned}$$

(14)

We have included i as a subscript to V, rather than include i in the state vector (a, z, e, j), to emphasise that the problems are seperate for each permanent type (each value of $\alpha _i$).

Agents are all born with zero assets together with the median values for the two shocks z and e. The distribution across the five values of $\alpha _i$ appears in Table 3.

We use the Farmer-Toda method to discretize (log) z and (log) e. We solve the model using pure discretization for both the value function iteration and the agent distribution (we iterate for the agent distribution, rather than simulate); these are the default methods of VFI Toolkit. We use 51 points on h, 501 points on a, 21 points on z, and 3 points on e.

The parameter values appear in Table 3, except $s_j$ which appears in Table 4, and $\kappa _j$ which appears in Table 5. We did not selected the parameter values by careful calibration, thus they are purely illustrative.

Table 3 Calibration of the model

Full size table

Table 4 Parameter values for age-conditional survival probability

Full size table

Table 5 Parameter values for deterministic productivity units

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kirkby, R. Computing Quantiles of Functions of the Agent Distribution Using t-Digests. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10472-6

Download citation

Accepted: 08 September 2023
Published: 26 September 2023
DOI: https://doi.org/10.1007/s10614-023-10472-6

Keywords

JEL Classification

E0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Computing Quantiles of Functions of the Agent Distribution Using t-Digests

Abstract

Similar content being viewed by others

Likelihood-free approximate Gibbs sampling

Computing Longitudinal Moments for Heterogeneous Agent Models

Generating Function Methods for Run and Scan Statistics

1 Introduction

2 Quantiles of a Function evaluated on the Agent Distribution

3 t-Digest

4 Trivial Example

4.1 Hyperparameters of the t-Digests

5 Example with Agent Distribution

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

The Life-Cycle Model

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Computing Quantiles of Functions of the Agent Distribution Using t-Digests

Abstract

Similar content being viewed by others

Likelihood-free approximate Gibbs sampling

Computing Longitudinal Moments for Heterogeneous Agent Models

Generating Function Methods for Run and Scan Statistics

1 Introduction

2 Quantiles of a Function evaluated on the Agent Distribution

3 t-Digest

4 Trivial Example

4.1 Hyperparameters of the t-Digests

5 Example with Agent Distribution

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

The Life-Cycle Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation