Imitation is the sincerest [form] of flattery.

— Colton, Charles Caleb (1824)

I quasi-replicate a number of classic papers in Quantitative Macroeconomics. The replications are quasi-replications in two senses: I do not attempt to use the same numerical methods to solve the model as the original authors, and I (only) replicate all figures and tables relating to the model.Footnote 1 My interest is not in nitpicking about where the original papers report a ’wrong number’ (whether due to typo, coding error, etc.), and for this reason I relegate all the actual replicated tables and figures to the appendix. The focus of this paper is instead on the lessons to be learned from these replications and on providing some suggestions for best practice based on the experience of performing the replications.

My main finding is that there is no replication crisis in the Macroeconomics of Quantitative Macroeconomics, but there is a minor crisis in the Quantitative. By this I mean that the major conclusions from all the papers replicated are unchanged, but most of the papers contain some numbers that are incorrect by a magnitude that is quantitatively important.Footnote 2

Replication is typically thought of as relating to data and statistics. So why replicate computational results from Quantitative Macroeconomics? The main reason is the exact same reason underlying the importance of replication to data and statistics: establishing the reliability of existing results. The need to do so follows directly from thinking of computational models as a form of laboratory in which we run experiments (Bona and Santos, 1997). A secondary use for replications follows as Economists often learn to write code by solving existing models and replication provides the needed reliable solutions for this.Footnote 3 If anything, simple mistakes may be more common when computing Quantitative Macroeconomic models than in other parts of Economics as they depend not only on using data and statistics but also require substantial coding. An additional reason is to understand the influence on Macroeconomics of the choice of which numerical methods are used to solve the model: I document some interesting examples of the importance of this.

Table 1 provides a list of the papers I replicate with a focus on general equilibrium heterogeneous agent models with incomplete markets. In defense of the selection I simply note that the replicated papers are well cited with a mean number of citations of 394 in Ideas Repec and of 1167 in Google Scholar as of early 2021 (and a minimum number of citations of 80 and 224, respectively). The codes implementing these replications are all available at github.com/vfitoolkit/vfitoolkit-matlab-replication. Note that this covers a range of ’model-types’ including partial and general equilibrium; finite and infinite horizon, including overlapping generations; stationary equilibrium and transition paths; and agent entry and exit. And involves analysing a variety of model ’outputs’ including time-series properties, cross-sectional distributions, aggregates, and panel-data.

Table 1 Papers Replicated

Replication of these papers was performed using discretized value function iteration with simulations and agents distributions computed on discretized state space; these methods have known reliable convergence properties to the true solution under conditions that are applicable to a broad class of Macroeconomic models (Kirkby, 2017a, 2019) as well as performing well on accuracy in comparisons with other methods (Aruoba et al., 2006; Santos, 2000; Peralta-Alva and Santos, 2014) as long as sufficiently large grids are used.Footnote 4 These discretized grid methods would be inappropriate for the solution of state-of-the-art models where a trade-off between speed-and-accuracy has to be made. For replication however the appropriate focus is on accuracy and robustness at the expense of speed. Discretized grid methods combine a high accuracy, as long as large grids are used, with known convergence properties and a robustness to a wide range of model properties. While it is impossible to know for certain that the solutions of the replications given here are the true solutions I am confident that the replicated solutions are accurate as the answers given are insensitive to the grid sizes used; this ’insensitivity’ is given a precise meaning below.Footnote 5

Implementation of the replications makes use of the VFI Toolkit for Matlab (Kirkby, 2017b), which has the advantage that the outputs of most functions that make up the codes involved in the replications have been widely tested and hopefully therefore less likely to contain errors.Footnote 6

Table 2 shows for each replication the quartiles of the percentage difference between the replication and original results.Footnote 7 It is based on all the entries of all the Tables from each paper: the absolute percentage difference between the replication value and the value in the original paper was calculated for every table entry, and the quartiles of these are reported. The main weakness of this is that it obviously misses any Figures. To ensure that the replication results are not driven by numerical error the replications were required to pass the test that a ’substantial’Footnote 8 increase in the grid size results in a change in the upper quartile of the absolute percentage difference of less than 5% between the results of the two replications (grid and substantially increased grid); note that this is much stricter than it first sounds as, e.g., many papers contain numbers like 0.1, so if this changed to 0.11 with the substantially increased grid this would be a change of greater than 5%. As a result it is believed that the replication numbers are the accurate numbers however this cannot be known for certain as, e.g., a parameter that should be set to 2.4 could instead be set to 2.6 due to a typo. Comparison of the measure across papers should be taken as illustrative rather than definitive as papers that provide, e.g., greater breakdown of statistics across different subpopulations, will somewhat naturally be likely to display greater numerical error.

Table 2 Percentage Difference between Numbers in Replication and Original Paper

The only ’substantial’ failure to replicate is the welfare results of early papers. This appears to be explained by the use of linear-quadratic methods, while we use non-linear methods to solve the models. For papers such as Imrohoroglu (1989) and Díaz-Giménez et al. (1992) the methods used solved the policy function with enough accuracy that their findings on model statistics related to policies and stationary distributions replicate fine. However those same methods led to highly inaccurate welfare evaluatations as the value functions were not accurately computed. This finding is not entirely novel, but it’s importance is widely underappreciated. Kim and Kim (2007) show that 1st-order approximation methods deliver incorrect welfare results if even when using the correct (to 1st-order) optimal policies (although these can be largely avoided by putting the 1st-order solution into the unapproximated welfare function), while Judd et al. (2017) show further that 1st-order solution methods are simply incorrect for many Macroeconomic models, deriving minimum error bounds that are large enough to be troubling. I conjecture that this problem, inaccurate welfare results, is likely widespread in early Quantitative Macroeconomics papers and recommend that any welfare result from pre-2000 should be treated as quantitatively suspect until replicated. The continued widespread use of linear-quadratic methods in Ramsey optimal policy where maximizing the welfare function is part of the computational exercise leaves some major open questions about the results of that literature until replication studies are undertaken in that area. Loosely related, first-order (and second-order) perturbation methods have also been shown to give incorrect solution to the Diamond-Moretensen-Pissarides model of search-labor markets (Petrosky-Nadeau and Zhang, 2017).Footnote 9

One topic that requires much greater discussion in Quantitative Macroeconomics papers is the discretization of shocks.Footnote 10 Many papers contain a substantial discussion of calibration and some robustness exercises to parameter values. The choice of discretization method by contrast rarely warrants more than a passing mention, often in a footnote, despite being vastly more important in most models than many parameters. In practice the discretization choices play a key role in determining income risk and the distributions of earnings and wealth. More subtle is the relationship between the exogenous shocks and market incompleteness. Note that in most incomplete market models the incompleteness arrives precisely because there are no assets with returns that span the space of idiosyncratic shocks. Hence when the idiosyncratic shocks are small the markets are largely complete, while when idiosyncratic shocks are large markets are very incomplete. The discretization of exogenous shocks, because it determines both the riskiness and range of the idiosyncratic shocks is therefore also determining the degree of market incompleteness that distinguishes heterogeneous agent models from representative agent models. Quantitative Macroeconomic papers would be much improved by treating these choices of shock discretization to the same level of discussion, analysis and sensitivity as any other modelling decision. As an example of their importance Guerrieri and Lorenzoni (2017) use the Tauchen method to discretize an AR(1) shock in a study of the credit crisis that followed the Great Financial Crisis of 2007.Footnote 11 Just changing the hyperparameter of the Tauchen method to other reasonable values can cause the zero-lower bound on interest rates to bind for decades, rather than the few years in the baseline model (and seen in reality).

Replication in Economics Controversy about replication has raged in Psychology where a project by the Open Science Collaboration to repeat one hundred influential studies was able to successfully replicate the original results in only around 40% of cases (Collaboration, 2015).Footnote 12 Closer to home for Economists have been controversies about the results of Reinhart and Rogoff (2010) on the relationship between government debt and economic growth, and Miguel and Kremer (2004) on the effects of deworming on education in Kenya.Footnote 13 Within the field of lab experiments in Economics Camerer et al. (2016) try to replicate 18 studies published in American Economic Review and Quarterly Journal of Economics during 2011-2014, and conclude that replication is successful in 60-80% of the papers (depending on exact metric of ’success’). In a related study Dreber et al. (2015) find that prediction markets in which people can bet on which replications will succeed and fail did well in the sense that when they predicted a replication would fail it did (when prediction markets predicted that the replication would succeed this was largely unrelated to outcome of replication); this suggests that informally the profession is aware of certain existing results that are unlikely to replicate. Ferraro and Shukla (2020) provide evidence that suggests empirical environmental economics suffers from many of these issues and suggest a variety of ways the profession might adapt and improve.

While replication is important it is not a panacea for all problems.Footnote 14 Even papers that were retracted due to known error continue to be cited; 20,000 articles listed as retracted by Replication Watch were still cited 85,000 times after retraction. Other loosely related issues include p-hacking and the bias of publication to only publish statistically signicant results (Brodeur et al., 2016, 2020). The problems don’t just lie with the studies themselves, newspapers rarely report on null-findings and rarely do follow-ups to reporting on results that fail to hold in reproduction studies (Dumas-Mallet et al., 2017).Footnote 15 Replications are also often potentially difficult, expensive, and time-consuming: a recent effort to replicate 50 papers studying cancer, with a budget in excess of $1.3 million, ended up replicating just 18. Certainly, the replications in the present paper consumed a lot of time.

We are not aware of any existing replication study in Quantitative Macroeconomics (beyond the two or three individual replications mentioned in the introduction). The closest is Chang and Li (2015) who look at research transparency or ’the basic goal of computational reproducibility’ (in the words of Miguel, 2021). They take a very different approach and rather than try to replicate the results of the papers, their interest is instead whether the original authors of the papers supply codes, and when they do whether these codes can simply be run to computationally reproduce the results of the papers. A similar approach is taken by Gertler et al. (2018) who find that in 203 papers from top Economics journals while many provide code only in 37% of cases did it actually run, and in only 14% of cases was there both raw data and the code that generates the papers results (tables and figures) from this data. These approaches are in line with the AEA (American Economic Association) Code and Data Policy,Footnote 16 although the interest of the AEA policy is about ensuring that a study is reproducible, rather than whether a study has been replicated. While subtle, the distinction is important as reproducible can be thought of as true even though the code or data-treatment contains errors and would fail to replicate; that original code runs and reproduces tables and figures in no way tests for the existence of errors in the code itself although it does make it much easier to detect and resolve them.

This current approach to replication in Quantitative Economics with it’s focus on reproducibility obviously misses any issues of whether the original results were themselves correct, which is the main purpose of replication. While availability of code is important, reproducibility is not replication. Replication necessarily involves writing new code as simply running existing codes includes replicating all the errors made in the original when treating the data and writing the code. Availability of code is important because code often contains information unintentionally missing from a published paper. For example, papers simply forget to state some initial condition, or the weights used during calibration, or the formula for a certain moment, or parameter values of a counterfactual exercise, etc.

Zimmermann (2015) suggests the need for a Journal of Replication in Economics as a way to overcome the current status quo in which academics typically receive little to no recognition or reward for performing replications. The area of Applied Econometrics is ahead in this area with the Journal of Applied Econometrics having a Replication Section since 2003. An online effort by ReplicationWiki Höffler (2017), hosted by the University of Göttingen, aims to provide a clearinghouse for replications, on the assumption that people already perform replications and simply need some outlet for them. Nor can citations nor a large following literature be relied on as a substitute for replication: oestrogen receptor cycling in the field of breast cancer research was built on two papers each of which had more than 1000 citations over nearly 20 years, but has now been found to be completely incorrect with neither of the original papers being replicable (Holding, 2019). Christensen et al. (2019) is a recent book that describes many of these issues, problems, and possible solutions, but with a focus on purely empirical work based on regressions and randomized controlled-trails.Footnote 17 It provides a good guide for those interested in improving the reproducibility of their own work.

For Quantitative Macroeconomics researchers interested in trying to ensure that their own computational work is reproducible Sect. 3 presents a checklist, based on my experience with difficulties commonly encountered. This checklist is strictly intended as an aid for researchers, not as a requirement to be imposed. Naturally it will be incomplete but should help researchers who wish to make their work more transparent and reproducible avoid the oversights most common in the literature.

By making replication easier to perform it is hoped that issues such as robustness of model prediction and sensitivity to parameters and model specification will become easier to perform. The importance of developing computational modelling packages such as Dynare, EconARK, GDSGE, niqlow and VFI Toolkit should viewed as part of contributing to this.Footnote 18 The literature on empirical regressions has begun developing tools to address these issues of specification searching with a good overview provided by Chapter 7 of Christensen et al. (2019).Footnote 19 Quantitative Macroeconomics would also benefit from such an approach, and simple replication of existing results is a first step on the road to being able to solve models easily enough to make this possible.

The rest of this paper simply describes some general lessons learnt from the process of replicating these papers. Much of what follows might be misread as picking on certain authors/papers by calling out their minor errors. This is far from my intention, which is to understand where common errors are being made and how the profession might do better. The best defense of my intentions is that any author/paper which appears in this work was one I have chosen to spend a few days of my life in replicating as I thought it was sufficiently important in the development of Quantitative Macroeconomics.Footnote 20 After all, [replication] is the sincerest form of flattery!

1 Lessons Learned from Common Issues

Some of the main issues encountered during the replications provide lessons for best practice that Macroeconomists can learn from. However the one common pitfall from which there is nothing to be learned is that coding bugs do occur, this appears to have affected a small fraction of the numbers reported in the papers; as a friend expressed it, if you start with n bugs and squash one you are left with n bugs. The main issues and recommendations based on these are discussed. The recommendations are then summarised as a checklist in Sect. 3.

Issue: Graphing Probability Distributions I recommend that researchers plot cumulative density functions, rather than probability density functions. Probability density functions can mislead for two reasons: first, they obviously depend on the number of grid points used; second, they appear more sensitive to numerical error. Since many solution methods in quantitative economics involve discretizing shock processes this leads to very different looking probability density functions when the number of grid points used to discretize the shock changes; loosely, doubling the number of grid points would halve the probability mass at each point.Footnote 21 This issue is minimized but not entirely eliminated when using cumulative density functions.

One alternative approach is to parametrize the probability density — say as Chebyshev polynomials, or as a mixture of parametric probability distributions, etc. — but this approach is likely limited if the interest is in, eg., inequality and the shares of Total Income held by the Top 1% as the parametrization will implicitly impose some assumptions on these shares.Footnote 22 Comparing a number of alternatives I concluded that when probability density functions are plotted the best performance comes from graphing kernel-smoothed density functions estimated from the discretized probability mass function.Footnote 23

Issue: Only baseline case parameters are provided Papers essentially always provide all the parameter values for their baseline calibrations (a few do not report the final value of things such as general equilibrium prices that would be of much use for replications when trying to understand where differences may be arising). However a number of papers do not report all the parameter values for alternative calibrations, such as those used for ’policy experiments’ or difference ’cases’ (e.g., Castaneda et al., 2003 and Hubbard et al., 1994). Such parameter values would be appropriate for inclusion in a technical computational appendix.

Issue: Naming variables Many papers use different names for variables in their papers and code, complicating reading the code for anyone else. Ideally this would not occur, but a more reasonable solution might be the provision of dictionaries anywhere this does occur.

Issue: Reporting parameter values Three main problems occur: First, the reported parameter is for a different time-period to the model (e.g., report the annual value, when model the period is two months). Second, reported standard deviation is for the stochastic process, but equations describe it as being for the innovations to that process. Third, parameters that vary over life-cycle are only reported as a Figure (so exact values are unavailable).Footnote 24 To be more precise about the second of these, many papers will, e.g., have an AR(1) process and describe \(\sigma \) to be the standard deviation of the innovations, but then when reporting the calibrated variables instead report \(\sigma \) as the standard deviation of the AR(1) process itself. My own suggestion is to use a notation that always specifically emphasises when, e.g, a standard deviation is that for innovations \(\epsilon \) to the AR(1) process z call it \(\sigma _{\epsilon }\), and when for the AR(1) itself call it \(\sigma _z\). This simply helps to differentiate between the two standard deviations which are otherwise often and easily mixed up by accident during writing.

Issue: Calibration Details Many papers will describe which moments were targeted by the calibration. But they will not provide details on how the calibration itself was implemented. While in earlier papers this was fine as most moments are targeted independently, more recent papers often jointly target a number of moments. This typically will mean they have implemented a single-objective optimization that assigns each target moment a weight (multi-objective optimization is also a possibility but based on informal conversations seems rarely used by Economists). These weights are not typically reported (eg., Castaneda et al. (2003) do not provide such detail). I suggest that papers should more often include a technical computational appendix which provides this kind of detail. Along the same lines the initial values from which such optimization takes place are almost never given. The availability of codes turned out to be important factor in mitigating this. For example Auerbach and Kotlikoff (1987) describe the calibrated values of their age-dependent parameter e, but do not explain that these are in fact the log values, and that one must take their exponential and then normalize them so that the age one value of \(e_1\) is set to 1; Figure 5.2 made it clear that something was missing in the original description of the calibrated values of e and as their codes are available it was easy enough to find out what.

Issue: Availability of Codes In a few cases the original codes are available from the authors website. In most cases however one had to contact the author directly, and even then some authors no longer had codes (to be fair some of these papers are from the early 1990s). As an extreme example the codes for Aiyagari (1994) are unavailable online and the author is deceased. While there is an increasing requirement from journals to provide codesFootnote 25 the most obvious improvement would be an increased use of github to make codes publicly available; journals that already provide their own online code repositories are a perfectly satisfactory substitute/complement. This issue appears to already be well recognized in Economics and is therefore likely ’already solved’ as it were. Current approaches typically have journals provide codes in downloadable zip files making the process much more onerous than if each Journal simply uploaded all codes to its own github repository or similar; this would make them all instantly searchable and easily accessed and read. The importance of making codes available is the clearest lesson from the replications reported in this paper. Where authors provided codes (often on email request) these were able to resolve many other problems that arose during replication for many of the reasons described elsewhere in this paper.

Issue: Parameter Robustness and Numerical Approximation Errors Many papers have a ’default’ parametrization and have performed some kinds of tests to check that their numerical methods are performing well at minimizing numerical error. They then look at how changing parameters would change certain model outputs. Often these tests will, eg., induce further curvature into certain parts of the solution and this interacts with the numerical methods to worsen their performance. For example Aiyagari (1994) reports the degree of precautionary savings (eg., as the resultant interest rate) for various parametrizations. While the results relating to low-risk and low-risk-aversion are numerically accurate, those relating to high-risk and high-risk-aversion contain substantial numerical error.

The results of tests for the magnitude of numerical errors, such as Euler Equation residuals (Santos, 2000), are sensitive to the parameter values. This fact is known to be the case from the theory underlying such tests but the issue is often ignored in practice.Footnote 26 One possibility would be that when measures of numerical accuracy are presented they should be reported across the range of parameter values that are made use of in the model. An alternative might be for the profession to move more towards the use of adaptive numerical methods, such as those in Brumm and Scheidegger (2017), which assess approximation errors and then update based on them as part of the solution method itself. Both of these suggestions are rather onerous so for the present simply having researchers more aware of this issue might be the best approach.

Issue: Welfare Evaluations Some of the replicated papers used linear-quadratic methods (Díaz-Gímenez, 2001) to solve the value function problem. Replication of these papers often showed high accuracy in variables that depend on the stationary distribution and policy function. However the welfare calculations appear to contain substantial numerical error. It is suspected, but not known, that this reflects that linear-quadratic methods perform fine for computing policy functions but provide a poor approximation of the actual value function itself. Since welfare calculations are based on the value function itself they were therefore erroneous. This illustrates how numerical errors in different aspects of the model can be very different. It is common practice to report the results of tests for the magnitude of numerical errors, such as Euler Equation residuals, which look at the policy function. It is important to understand the conditions under which these also imply limited numerical errors elsewhere in the model (Santos, 2000; Fernandez-Villaverde et al., 2006; Kirkby, 2019). In the current instance of the errors in the value function and linear-quadratic methods the theory relating the value function and Euler equation residuals (Santos, 2000) does not apply.Footnote 27

Issue: Formulae for model statistics Typically, when reporting model statistics papers provide a verbal description of how they are calculated, but rarely include an explicit equation. This lead to some difficulties in replication. For example, in Díaz-Giménez et al. (1992) most statistics could be replicated exactly, but a few table entries could not, it seems likely this is simply because I was unable to turn the verbal descriptions into the precise formula. Another example: Restuccia and Urrutia (2004) calculate ’cross-sectional disparity’ as the standard deviation of log earnings, but what is unclear from the written description is that in this two-period OLG model the ’cross-section’ is computed conditional on age being 2, not across the whole model economy. One solution would be to put more formulas in Technical appendices, however this seems overly onerous given that the same issue can largely be solved by improved availability of codes.

2 Influence of Numerical Methods on Economics

The need for greater discussion in Quantitative Macroeconomics papers of the discretization of shocks —on par with the usual discussion of parameter choices and the sensitivity of results— stems from the large influence these have in many models on driving both modelling choices and quantitative results.

Why is exogenous shock discretization so important in incomplete markets models? Because the exogenous shocks and the degree of market incompleteness are essentially the same thing. In most models if the idiosyncratic shocks were zero, then markets would be complete. It is precisely the idiosyncratic shocks that make markets incomplete, and hence the discretization of these exogenous shocks is indirectly determining the degree of market incompleteness and driving the differences of the models from standard representative agent models.

The main discretization methods are quadrature methods for AR(1) shock processes with normally distributed innovations, namely the Tauchen and Rouwenhorst methods (Tauchen, 1986; Rouwenhorst, 1995).Footnote 28 Both perform acceptably in most situations as long as sufficient grid-points are used although the later is to be preferred when shocks are highly persistent.Footnote 29 The more recent Farmer-Toda method outperforms both of these, except for Rouwenhorst for very highly persistent shocks (Farmer and Toda, 2017). When any of these methods are used both grid-size and any hyperparameters need to be reported, and more importantly some sensitivity/robustness analysis to these choices should be performed. The most common ’error’ in the literature is simply to choose ‘too few’ grid-points and ignore the large quantitative impact of this in driving results. The same is true for finite-horizon models with AR(1) shock processes with normally distributed innovations where the parameters are age-dependent: the natural extension of the Rouwenhorst method performs best, and the natural extension of the Tauchen method is transparent (Fella et al., 2019). The main point here though is not so much which method is used, but that these choices need to be discussed in the papers at least as much as any other calibration choice; they only become irrelevant with grid-sizes of a magnitude never seen in practice.

The focus of all of these common discretization processes on normally-distributed shocks also seems misguided. Given that discrete Markov processes will be used to compute the models, why run the data through the straight-jacket of an AR(1) process before it reaches the model? Why not go more directly from data to discrete Markov process? This approach allows much more general and realistic shock processes to be used, and is likely to be especially important in any attempts to model income risks, rare disasters (and more broadly the impacts of climate change), and asset prices. Several methods to do this already exist and the literature would be improved by their more widespread adoption; again, alongside more discussion in papers of these discretization choices and their impact on results. Some existing approaches include the quadrature method of Farmer and Toda (2017) which allows more non-parametric approximations such as an AR(1) with gaussian-mixture shocks, the approach of Castaneda et al. (2003) who simply calibrate a four-state Markov directly, Toda (2020) which works directly from the raw data, and the use of histograms to create ’bins’ and then simply ’count-and-normalize’ transitions to implement the maximum-likelihood estimator of a finite-state markov; Kirkby (2017b) explains this in detail for model of Hansen (1985).

Beyond just the choice of discretizing shock processes, the reporting of various choices of numerical methods and hyperparameters would ideally also be more widely discussed in papers. But given the onerous nature of trying to test for sensitivity/robustness of these choices this is probably best left to replication studies using different methods.

One article I would have liked to replicate but did not is Kydland and Prescott (1982). The reason is itself an interesting example of the important role played by the choice of numerical methods, especially those that involve large amounts of approximation. The model of Kydland and Prescott (1982) contains a six-dimensional state variable, making it prohibitively complicated for the discretized value function iteration methods I use in these replications. The model can however be easily solved using the linear-quadratic value function iteration methods used by Kydland and Prescott (1982), which involves solving for six coefficients, rather than a full six-dimensional object. This is because using linear-quadratic value function iteration methods means that the full distribution of the shocks does not matter for evaluating expectations of next periods value function, only their conditional mean.

The issue of the use of linear and log-linear, and first- and second-order perturbation in welfare evaluations has already been described in the Introduction. The results of Judd et al. (2017) showing that the minimum error bounds on linear, log-linear, and first-order approximations are large enough to be problematic for most Economic models should dissuade Economists from using them in any application. This is especially true thanks to the implementation of second-order and higher methods in many available codebases (including Dynare). Users should also be aware that first-order methods imply only the conditional mean matters for expectations, and that with second-order only the conditional mean and conditional second moment matter; this means they are, e.g., simply unusable for any study of the impact of rare events/disasters or conditional changes in volatility. Wherever possible Economists should be making greater use of global non-linear solution methods.Footnote 30

One potential issue we have not addressed is the uniqueness of general equilibrium. This is not thought to be a problem for any of these papers, but nor is it known for most of these that it is not a problem. Aiyagari (1994) provides a good illustration: (i) We know that the equilibrium is unique if the CES parameter (\(\sigma \) in \(\frac{c^{1-\sigma }}{1-\sigma }\)) is less than one (Light, 2020), mathematically interesting, but not an economically relevant calibration, (ii) We know that there is more than one equilibrium for some calibrations (Acikgoz, 2018), again, mathematically interesting, but not an economically relevant calibration (the calibration that displays multiple equilibria has a capital-output ratio of 50, empirically realistic range is 3-8). (iii) We know ’almost certainly’ that for standard calibrations Aiyagari (1994) has a unique general equilibrium, an algorithm proven to converge to any and all equilibria with certainty, but then stopped after finite time, finds just one equilibrium, hence almost certainly (Kirkby, 2019). So while multiple equilibria are a theoretical concern they do not appear likely to be of concern in the present replications and none of the replication results suggest that there was a concern. Note that this issue is even stronger for general equilibrium transition paths where not only is uniqueness not known but nor is existence theoretically established.

3 A Checklist for Reproducibility

Table 3 is provided to act as a simple checklist that researchers interested in ensuring reproducibility of their work can use to avoid common omissions. The table is not intended to be comprehensive, but is intended to make it easier to avoid omissions that are common in the existing literature.

Table 3 Checklist for Reproducibility

4 Conclusions

We end simply with an inculcation to the importance of reproducibility of results in Economic Science, and in Science more generally: “Non-reproducible single occurrences are of no significance to science.” — Popper, Karl (1934, The Logic of Scientific Discovery)