How Valid are Synthetic Panel Estimates of Poverty Dynamics?

A growing literature uses repeated cross-section surveys to derive 'synthetic panel' data estimates of poverty dynamics statistics. It builds on the pioneering study by Dang, Lanjouw, Luoto, and McKenzie (Journal of Development Economics, 2014) providing bounds estimates and the innovative refinement proposed by Dang and Lanjouw (World Bank Policy Research Working Paper 6504, 2013) providing point estimates of the statistics of interest. We provide new evidence about the accuracy of synthetic panel estimates relative to benchmarks based on estimates derived from genuine household panel data, employing high quality data from Australia and Britain, while also examining the sensitivity of results to a number of analytical choices. Overall, we are more agnostic about the validity of the synthetic panel approach applied to these two rich countries than are earlier validity studies in their applications focusing on middle- and low-income countries.


NON-TECHNICAL SUMMARY
The prevalence of moves into and out of poverty has long been of a topic of interest for social policy. We would like to know the proportion of the population that is always poor over a period of time, what fraction move into or out of poverty, and what fraction are never poor. Estimation of these poverty dynamics statistics requires data in which individuals are followed over time, i.e. data from household panel surveys or linked administrative record data. In many developing countries, these longitudinal data are simply unavailable or of poor quality.
Faced with the combination of policy interest and data problems, researchers have developed innovative statistical methods to derive estimates of the poverty dynamics of interest from cross-sectional household survey data -so-called 'synthetic panel' methods.
Virtually all the applications of these methods to date have been to developing countries.
This research includes a small number of validation studies that benchmark the synthetic panel estimates against the 'true' estimates that can be derived from genuine household panel survey data. These validation studies conclude that synthetic panel methods work quite well.
Our research provides new evidence about the accuracy of synthetic panel estimates relative to benchmarks based on estimates derived from genuine household panel data, employing high quality data from Australia and Britain, while also examining the sensitivity of results to a number of analytical choices.
Our analysis shows that the DL method works less well when it is applied to Australian and British data than when it is applied to data for middle-and low-income countries. For our two countries, we also demonstrate that the quality of the estimates is sensitive to analytical choices about the way in which the methods are implemented. These choices include sample selection criteria such as the age range of the household heads and the level at which the poverty line is set. We also show that the accuracy of the synthetic panel estimates can depend on the particular time period that is analysed.
We discuss the extent to which our findings are generally applicable and to which they arise from having analysed two particular rich countries for which high-quality panel data exist. We set out a number of topics for further research to help explain the differences between our findings about validity and those of other validation studies (which focused on developing countries).

Introduction
There is a growing literature that employs repeated cross-section surveys to derive 'synthetic panel' data estimates of poverty dynamics statistics building on the pioneering study by Dang, Lanjouw, Luoto, and McKenzie (2014, hereafter 'DLLM') providing bounds estimates and the innovative refinement proposed by Dang and Lanjouw (2013, hereafter 'DL') providing point estimates of the statistics of interest. All but one of the applications to date of these methods, of which there are many, have been to middle-and low-income countries. DLLM focus on statistics summarising poverty status in two years, e.g. the joint probability of being poor in one year and poor in the second year, and develop methods that provide both non-parametric and parametric bounds on this probability and each of the three other joint poverty probabilities. They assess the validity of their approach using panel data: synthetic estimates, derived by treating the panel data as two independent cross-sections, are compared with the 'true' estimates derived from the longitudinal data per se (more about this below). DLLM conclude, using data for Indonesia and Vietnam, that 'the bounds can be narrow enough in practice to make the estimates useful' (DLLM: 124). Cruces et al. (2015) assess the DLLM bounds method using data from Chile, Nicaragua, and Peru, incorporating extensive examination of the sensitivity of their results to a number of analytical choices about definitions, and conclude that 'the methodology performs reasonably well ' (2015: 163).
Perez (2016) also assesses the DLLM bounds method but using Mexican data. DL refine the DLLM method and derive point estimates of poverty probabilities rather than bounds. Their empirical analysis, based on data for Bosnia-Herzgovina, Laos, Peru, Vietnam, and the USA, supports the validity of their refined method: [W]e show that our estimates are quite accurate … We find that estimation results are good not only for the general population but for smaller population groups as well, and are associated with much tighter confidence intervals than even direct, panel-data based estimates in those settings where the sample sizes for the cross sections are large enough. (DL: 36.) The only other validation study of the DL method to date is by Garcés Urzainqui (2017) In sum, previous research suggests that synthetic panel methods produce valid estimates and there are now many applications to a large number of countries in the developing world. However, there is a need for further validation studies of synthetic panel methods of estimating poverty dynamics statistics. This paper meets that need and makes a number of contributions in addition.
There are few validation studies of the DL method. We assess the DL method in detail, also incorporating sensitivity checks to a number of analytical choices about definitions analogous to the way in which DLLM and Cruces et al. (2015) assessed the performance of the DLLM method. We also provide DLLM parametric bounds estimates.
We add substantially to analysis of the DL and DLLM methods in rich country contexts with our study of Australia and Britain; DL's US case study is the only previous rich country application of the synthetic panel approach. Our paper is the first to consider how variations in the age range of the household head used to define analysis samples affects results for a given country. (Age ranges vary across earlier studies but not within them.) We also consider the impact of using different definitions of the cohorts used to derive the parameter ρ which is a fundamental ingredient of the DL method (explained below). Only Garcés Urzainqui (2017) has done this before.
Our research is also distinctive because we examine the sensitivity of the DL method to the choice of the poverty line for the first time. DLLM and Cruces et al. 2015 examined sensitivity of the DLLM bounds approach (not the DL one). This turns out to be important. In addition, we derive poverty dynamics estimates for subgroups of individuals defined by age (0-17, 18-59, and 60+ years).
Further distinctive features of our work are as follows. For all sets of analytical choices about definitions, we derive estimates of poverty exit and entry rates, i.e. estimates of two conditional probabilities, in addition to estimates of the four joint probabilities that have been the focus of previous work. In this respect, we are taking up the challenge of Fields and Viollaz (2013: 20) who argue that it is these conditional probabilities that are more relevant to the essence of 'poverty dynamics' and who contend that the DLLM method estimates them less accurately. DL provide conditional probability estimates, noting that they are 'slightly less accurate' than the joint probability estimates (p. 33). We provide new and more extensive evidence about whether estimates of conditional poverty probabilities are more or less accurately estimated than joint probabilities.
In addition, because the BHPS and HILDA are much longer-running household panels than those for any developing country -we use data collected annually over 18 years for the BHPS, and over 15 years for HILDA -we can provide a detailed assessment of the extent to which the accuracy of synthetic panel estimates of poverty dynamics statistics vary according to the year or period studied. This turns out to be important.
Finally, we consider the benchmarks used to assess the accuracy of the synthetic panel estimates. As we discuss below, and has not been pointed out before, the 'true' benchmarks employed by DL differ from those that are typically used in 'standard' panel data approaches to poverty dynamics. We show how the benchmark estimates shift if one changes the following rule and consider the implications for assessments of accuracy of synthetic panel estimates.
Although the application of DLLM and DL methods has focused on developing countries, there is value in assessing their validity in rich country contexts even though panel data are more common. For example, there are long-standing concerns about attrition in the longitudinal data components of the EU's Statistics on Income and Living Conditions (EU-SILC), i.e. the sources used to calculate the EU's measure of persistent poverty. Among the countries using household panel surveys to collect data about income and poverty annually over a four-year period, there is substantial loss to follow-up. For example, Jenkins and Van Kerm (2017: Figure 22.1) show that around one half of the countries using surveys have fouryear (2008-2011) retention rates of less than 70% with the smallest rate just over 40% (UK).
By comparison, the four-year retention rates following the first waves of high-quality panels such as the BHPS and HILDA are nearly 80% (Watson and Wooden 2011, Figure 3).
The rest of our paper unfolds as follows. In Section 2, we review how the DLLM and DL methods work, and point out the key analytical choices that are required to implement them. In Section 3, we describe our HILDA and BHPS data and explain our various definitional choices. We report our empirical results in Sections 4 and 5. Section 4 examines the accuracy of estimation of the cross-year correlation parameter (ρ, i.e. 'DL rho') that underpins the DL method, and which is derived using pseudo-panel methods. We show that, depending on the definition of the cohorts and the age range of the household head used to define the analysis sample, DL-method estimates of ρ can vary substantially depending on the time period considered and also be very different from the 'true' panel data benchmark estimates.
In Section 5, we report our assessments of the validity of DL method estimates of joint and conditional poverty statistics, focusing on a 'leading case' set of choices relating to the definition of cohorts, sample selection (the age range of household heads), and the poverty line. In Section 6, we document how assessments change as we vary, in turn, choices about the 'true' panel benchmark (the following rule issue), the age range of household heads, and the poverty line. Finally, we look at estimates of poverty dynamics statistics for three age groups of individuals (aged 0 -17, 18-59, and 60+ years). Our conclusions are in Section 7. Overall, we are more agnostic about the validity of applications of the synthetic panel approach in the context of these two rich countries than the earlier validity studies are about their applications which focused on middle-and low-income countries.
For brevity we report only a selection of results in the main text. Appendices A (HILDA) and B (BHPS) in the Supplementary Material report the estimates of the DL(LM) income model regressions, the means of the income predictors, and kernel density plots of predicted log incomes compared to a normal distribution benchmark, year by year. We also provide a full set of estimates of all poverty dynamics statistics for each of the 28 different combinations of definitions we use. Estimates of the cohort regressions used to derive DL rho (see Section 2) are available from the authors on request.

How do the DLLM and DL methods work?
With cross-sectional survey data for a pair of years (Year 1, Year 2), one has information about the marginal distributions of income in each year. (The outcome variable might be consumption rather than income; we refer to the latter.) Clearly, there is no information about the joint distribution of income in the two years, nor thence information about poverty dynamics. The DLLM and DL methods work by using a model and associated assumptions to fill in the missing longitudinal information. In this section, we review the key elements of the two methods, drawing heavily on the original expositions.
The first step, common to DLLM and DL methods, is an income model for each of Year 1 and Year 2. Suppose that income yit for household head i in year t is described by log(yi1) = β1′xi1 + εi1 (1) log(yj2) = β2′xj2 + εj2 (2) where xi1 and xj2 are vectors of time-invariant predictors in Years 1 and 2.
The Year 1 income for each household head j observed in Year 2 is unobserved but it can be predicted using model estimates and two auxiliary assumptions. Ordinary least squares regression applied to each of (1) and (2)  Year 1 (with mean 0, s.d. � 1 ) and predict the outcome in Year 1 using the expression log� 1 � � = ̂1 ′ 2 +̌1, where ̌1 is the residual imputed to each j. We now have synthetic panel data from which poverty dynamics statistics can be calculated. Although the observation unit is the household head, application of appropriate survey weights (that also account for the number of individuals in each head's household) provides estimates referring to the population. To counter the variability introduced by the stochastic nature of the imputations, one repeats the random-draw-and-calculation step R times and averages the resulting estimates. We find that setting R = 50 is sufficient. DLLM also proposed a parametric bounds approach in order to narrow the bounds, arguing that non-parametric bounds for the poverty dynamics statistics may be wide and hence not particularly useful in practice. DLLM's key additional assumption is that the distribution of errors in (1) and (2)  Prob(yj1 < z1 and yj2 < z2 ) = Φ � where Φ(.) is the bivariate normal cumulative distribution function, and z1 and z2 are the DL build on DLLM's parametric bounds approach, innovatively showing that one can derive a point estimate for ρ (and thence for each of the poverty statistics of interest) from the data already to hand rather than by relying on auxiliary estimates from other surveys to provide bounds. (There are also a number of other extensions, including applications to income dynamics over more than two years, and to mobility between more than two income classes, but we do not examine these aspects here.) DL's key insight is that pseudo-panel and also sufficiently large cohort sizes to ensure sufficient precision.
Second, DL show in their Proposition 2 that the all-important cross-year correlation of residuals, ρ, can be derived from 1 2 and other information already to hand from the income models: This is the 'DL rho' estimate that we report below. DL also show that another estimate of ρ can be derived (Corollary 2.1), though they state (p. 13) that it typically provides very similar estimates. We find this too, and so do not report this other estimate.
Given the estimate of ρ, estimates of poverty dynamics statistics can be derived using the same approach as set out for the DLLM parametric bounds approach: cf. (3) (2011), and is the definition used by the OECD and Eurostat to produce their inequality and poverty statistics.
To include a very small number of zero values for income, we follow DL (p. 6) and apply a modified Box-Cox transformation to observed incomes.
We set the poverty line at 60% of contemporary national median income in most of our analysis, but also consider 50% of contemporary national median income as an alternative.
The 60% cut-off is used by UK official statistics and Eurostat to derive their 'headline' poverty statistics. Australia and the OECD have commonly used the 50% threshold. DL(LM) mostly use official poverty lines in their studies and so we are following them in this respect.
An important difference is that our poverty lines are relative lines whereas the official poverty lines for the countries that DL(LM) consider are absolute poverty lines typically derived with reference to calculations of minimum cost food and non-food budgets.
We undertake analysis using samples selected according to two definitions of the age of the There are two issues here. One is the need to ensure the time-invariance assumption is satisfied. The second concerns the treatment of household formation and dissolution, but this is more an issue concerning the definition of the benchmark 'true' panel estimates of poverty probabilities and the panel survey's following rule (the issue we flagged in the Introduction).
Relevant to both issues is the question of the age ranges in which household formation and dissolution is most prevalent. By comparison with developing countries, in rich countries, households with heads aged 55-75 may be more stable than those with a head aged  because divorce and partnering is more common among the latter group, and longevity is greater. In any case, there is a separate argument in favour of using as wide an age range as possible because this provides estimates with greater coverage of the population and includes groups of policy interest such as elderly people.
The predictors that we include in the Year 1 and 2 income models are much the same as those employed by DL (see their Appendix 2 (5) and YOB (10).
Numbers of cohorts and cohort size for each definition and country are shown in

Estimates of DL rho using pseudo-panel methods
We report estimates of ρ for each cohort definition and sample selection criterion, as well as information about numbers of cohorts and cohort size, in Table 1 (HILDA) and Table 2 (BHPS). We do not show the results for every Year 1-Year 2 pair here; instead, and in order to highlight differences arising from cohort definitions and sample selection, Tables 1 and 2 show averages across year-pairs ( (5)) of the three is the 'true' panel rho tracked well by DL rho and, even in this case, there are some noticeable differences between them in the very earliest and very latest year-pairs. For the other three combinations, differences are markedly larger, and it is clear the choice of household head's age range is the principal contributor to the differences between the estimates. In the two charts on the right-hand side of Figure 1(a), DL rho differs substantially from the 'true' panel estimate, especially in the second half of the period, and fluctuates substantially. In addition the DL rho estimate generally declines over time, contrary to the slight rise in the 'true' panel rho.
For Britain (Figure 1(b)), many of the same patterns are apparent. The main differences compared to Australia are, first, that there is slightly more variation over time in the estimates of the 'true' panel rho. Second, the differences between estimates of DL rho and the 'true' panel rho are not as large as those for Australia, even for the two combinations with household head aged 25-55. This suggests that BHPS synthetic panel estimates of poverty dynamics statistics are likely to be more accurate than their HILDA counterparts, and less sensitive to the choice of definitions and sample selection.
Why there are large changes in DL rho estimates over time is unclear. There were no changes in HILDA or BHPS design over the period that explain this; nor are they correlated with changes in, for example, changes in average cohort size or some other feature of the cohort regressions.
A more general lesson from Figure 1 is that the accuracy of DL rho estimates depends on the precise years considered. (This is also clear from DL's results -observe the 'relative difference (%)' summaries reported in their Table 2 for countries with more than one year-pair of estimates -but our more complete coverage of long time periods for each country makes the finding more manifest.) For each cohort definition and head's age range in our In the next section, we put these issues on one side, and use the estimated values of DL rho, along with the other parameters of the income models, to derive estimates of poverty dynamics statistics.

Synthetic panel estimates of poverty dynamics statistics: leading case
In this section we provide synthetic panel estimates of the four joint poverty probabilities and two conditional poverty probabilities, for Australia and Britain. We focus on a 'leading case' set of definitions, and in Section 6 consider the impact on the estimates and their validity of changes to these definitions. Our leading case is based on the combinations of sample selection criterion and cohort definition that provide estimates of DL rho that are the closest to the 'true' panel value (see Figure 1). This maximizes the chances that the DL methods estimates are accurate, other things being equal. Thus, the leading case is based on the following criteria: household head is aged 25-75; the cohort definition is COB*YOB (5) for Australia and Sex*YOB(5) for Britain; the poverty line is 60% of contemporary median income; and the estimates refer to 'all individuals'.
Our leading case estimates of the four joint poverty probabilities for each year-pair are shown in Figure 2 (HILDA) and Figure 3 (BHPS), together with related benchmarks to compare them with. Figure 4 shows the estimates of poverty exit and entry probabilities for both countries. Each figure has the same format. We show DLLM parametric bounds estimates, assuming 0.5 < ρ < 0.9. Although these ρ bounds differ from those used by DLLM for developing countries, they are consistent with the 'true' panel estimates that DL report for the USA (Table 2) and with Tables 1 and 2, and Figure 1 above. The black dots labelled 'parametric estimate' are the DL-method probability estimates, derived using the approach discussed earlier.
We also show what the estimates would be were the derivation undertaken using the 'true' panel rho rather than DL rho. The benchmarks for assessing the accuracy of the DL-method estimates are shown by the 'true' estimates and their pointwise 95% confidence intervals (dark grey band).
Consider first the HILDA estimates of the joint exit probability shown in the bottomleft panel of Figure 2. The parametric bounds estimates fluctuate slightly from one year-pair to the next, but are consistently between around 4% and 9%, a range of some 5 percentage points, and hence relatively wide. The DL parametric estimates also fluctuate somewhat, but tend to lie in the middle of bounds estimates (apart from at the end of the period): the values are around 7% to 8%. If the 'true' panel rho had been known, the estimates of the joint probability would have been quite similar -except at the very beginning of the period and at the end of the period, which is when the DL rho and 'true' panel rho estimates differ the most (see Figure 1(a)). The similarities between the series are a reminder that the accuracy of the DL-method probability estimates is also contingent on the income model predictions and related assumptions, a point to which we return in Section 7.

<Figure 2 near here>
We assess the accuracy of the DL-method estimates of the joint exit probability by considering whether they lie within the 95% confidence interval of the 'true' estimates. It is clear from Figure 2 that, for the vast majority of year-pairs (11 out of 14), the DL estimate is outside the 95% confidence interval of the 'true' estimate. On average, it is around 2 percentage points larger than the benchmark 'true' joint probability (more at the end of the period).
For the other three joint probabilities, the headline message regarding accuracy is mixed. For the joint persistence probability (top left hand figure in Figure 2), the DL-method estimates are accurate in the sense that they lie with the 95% confidence interval of the corresponding 'true' estimate in 11 out of the 14 comparisons. The DL estimates of the joint persistence probability tend to be slightly smaller than their 'true' counterparts, but both show a small decline from around 11% at the beginning of the 2000s to around 10% just over a decade later.
However, although the DL method estimates the joint persistence probability relatively accurately, it does not do so for the other two joint probabilities. The 'true' joint probability of being non-poor in two consecutive years increases by around 4 percentage points over the period, from around 76% to around 80%. By contrast, the DL estimates show no rising trend (around fluctuating values). The DL estimates are within the 95% confidence interval of the 'true' estimates for only four of the 14 year-pairs (all of which are at the start of the period). The joint entry probability is also inaccurately estimated, with only two of the 14 DL-method estimates within the benchmark confidence band. The 'true' estimate fluctuates around 5% to 6% and the DL-method estimate is somewhat larger. For 2006 and the three years at the end of the period, the DL-method estimate is up to 2 percentage points greater than the upper bound of the confidence interval, which is a large gap when assessed relative to the 'true' point estimate.
The findings for the BHPS contain both differences and similarities to the HILDA ones: see Viollaz (2013) as cited in the Introduction, but consistent with DL, we do not find that poverty exit and entry rate estimates are markedly less accurate than joint probability estimates. Our results for conditional and joint probabilities have more similarities than differences. Benchmark confidence bands tend to be larger, especially for exit rates, reflecting the smaller sample numbers 'at risk' that underlie the calculations. But we also see that BHPS exit and entry rates are more accurately estimated than HILDA exit and entry rates. As well, for both countries, inaccuracies tend to be more prevalent towards the end of the time periods covered, and the DL-method estimates of exit and entry rates tend to be larger than the corresponding 'true' estimates. <Figure 4 near here>

Synthetic panel estimates of poverty dynamics statistics: variants
Our results so far show that the accuracy of the DL method depends on the time period considered and the country context. But these findings are contingent on a number of definitional assumptions. In this section, we consider the robustness of our conclusions to variations in definitions around the leading case considered so far. For brevity, we show only a selection of our results here; the complete set is provided in the Supplementary Material.

Changing the 'true' panel benchmark
The DL(LM) approach is implemented at the household level, as explained in the  We answer this question focusing on estimates of poverty exit and entry rates. Whether changing the benchmark definition would make a difference in other contexts is difficult to assess. Taking account of household change is likely to raise poverty entry rate estimates rather than exit rate estimates in most countries compared to the 'true' panel approach. This is consistent with the well-known finding for rich countries that household change is particularly associated with poverty entries rather than poverty exits (Bane andEllwood 1986, Jenkins 2011 We leave open for further research the issue of what is the appropriate benchmark to use to assess the accuracy of synthetic panel estimates and, for the rest of this paper, we return to using the 'true' panel estimates given our DL reference point.

Changing the household head's age range
We now consider the impact of narrowing the age range of the household head from 25-75 to 25-55, but we retain all other definitions associated with our leading case. Figure 6 displays estimates for HILDA (panel a) and BHPS (panel b) and is directly comparable with Figure 4 (based on the wider household head age range). and 'Parametric est. true rho' series in Figure 6 and see also the two corresponding bottom figures in Figure 1(a).
For the BHPS, use of the narrower age range also leads to poorer quality estimates of poverty exit and entry rates, but the effect is not nearly as marked as in the Australian case.
The deterioration in quality is also related to the poorer accuracy of DL rho estimates relative to the leading case, but the effects are not as large as for HILDA ( Figure 1, panel (b)).
The Supplementary Material contains a complete collection of figures showing the impact of using the 25-55 age range compared to the 25-75 one, and for joint probabilities as well as the conditional probabilities discussed here. The appendices confirm that changing the definitions away from the leading case scenario generally leads to less accurate estimates of all poverty dynamics statistics.

Changing the poverty line to 50% of the contemporary median
We now consider the impact of changing the poverty line to 50% of contemporary national median income (from 60%), but otherwise retaining all other leading case definitions. Figure   7 shows the joint probability estimates for HILDA and Figure 8 shows them for the BHPS. Figure 9 shows the estimates of poverty exit and entry probabilities for both countries.
Compare these figures with Figures 2, 3 and 4 respectively for the leading case.
The most important finding is that, with the lower poverty line, the DL method estimates for Australia are more accurate. For example, the number of estimates of the joint exit probability that lie within benchmark 95% confidence interval increases from 3 out of 14 to 10. For the joint persistently non-poor probability, the corresponding numbers are an increase from 3 to 9 and, for the joint entry probability, an increase from 2 to 7. There are corresponding improvements in the accuracy of the DL-method estimates of poverty exit rates (the number within the reference confidence band increasing from 2 to 10) and of poverty entry rates (the number increasing from 2 to 7): see Figure 9(a).

<Figures 7, 8, and 9 near here>
There is also some increase in estimate accuracy associated with using the lower poverty line in the British case though the effect is less noticeable. Indeed, the joint poverty persistence probability is now less accurately estimated by the DL method, with number of estimates lying in the reference 95% confidence band falling to 9/17 compared to 17/17 in the leading case scenario. There is no change in the number of estimates of the joint non-poverty persistence probability within the reference band (9/17), but the number increases to 11 from 9 for the joint exit probability and to 13 from 8 for the joint entry probability. See Figure 8.
The accuracy of the DL-method estimates of poverty exit rates also improves compared to the leading case scenario (the number within the reference confidence band increasing from 9 to 14) and of poverty entry rates (the number increasing from 9 to 13): see Figure 9(b).
Our analysis demonstrates that the accuracy of the DL-method estimates of poverty dynamics statistics is sensitive to the choice of poverty line. DLLM undertook extensive analysis of the robustness of their non-parametric bounds method to a wide range of poverty lines, for Indonesia and also Vietnam, and they conclude that 'our approach is found to work well for the full possible range of poverty lines that might be specified' (DL: 122 countries. An interesting task for future research is analysis of the robustness of the DL method to poverty line choice in developing country settings.

Estimates for population subgroups
DL argue that their method provides good estimates for population subgroups as well as the population as a whole, using regional breakdowns to illustrate their case. Here we consider the accuracy of subgroup estimates using breakdowns by age, reflecting rich country policy interest. Figure 10 shows Our overall conclusion regarding the accuracy of subgroup estimates is more equivocal than DL's. In our analysis, some of the estimates for some subgroups and for some statistics (conditional or joint probabilities) are more accurate than the corresponding estimates for all individuals. At the same time, some are noticeably less accurate.

Conclusions
Our analysis shows that the DL method works less well when it is applied to Australian and British data than when it is applied to data for middle-and low-income countries. To what extent are our findings generally applicable and to what extent do they arise from having analysed two particular rich countries for which high-quality panel data exist?
Clearly Our research shows that there is scope for further research that is relevant for developing as well as developed countries. For example, it would be useful to know whether some of the DL method estimates sensitivities we have found, such as to the choice of head's age range and to the poverty line, are also present in other country contexts. We also need further research comparing the DL method with other synthetic panel approaches such as that proposed by Bourguignon and Moreno (2015). Garcés Urzainqui (2017) is the only study to do this to date.
Aside from data availability and quality issues, there is the question of whether the synthetic panel approach is less applicable to high-income countries rather than middle-or low-income countries because the underlying assumptions are less appropriate or the income modelling does not work so well in high-income country contexts. In this connection we note that, of the five countries used in DL's validation study, the synthetic panel estimates are slightly less accurate for the one rich country they included (USA): compare the 'goodness of fit' statistics in their Tables 3-5.
Key assumptions in DLLM's parametric bounds and DL's point estimate approaches are that Year 1 and Year 2 incomes are bivariate lognormally distributed and that income predictors are time-invariant (see Section 2). However, our checks of these assumptions based on the same methods as DL indicate that these issues are no greater a problem in HILDA and the BHPS than they were for DL's study countries (see the Supplementary Material). These checks do not consider the way cross-year income dependence is characterised, and it may be that a different approach from bivariate normality, e.g. involving copula specifications for dependence, is more fruitful.
Regarding the success of the income modelling in different country contexts, we note that the adjusted-R 2 of our income regressions based on HILDA and BHPS data are lower than the adjusted-R 2 that DLLM and DL report.   (1) YOB (5) Sex* YOB (5) Sex* YOB (10) COB* YOB (3) COB* YOB (5) COB* YOB (10 (2001/2002 to 2014/2015). The estimates for each year-pair are based on disjoint subsamples representing averages over 50 sample splits. Shown in parentheses are the standard deviations of the corresponding rho estimates, calculated over all 50 sample splits and 14 year-pairs. Each column represents a different cohort definition, where YOB(s) = year of birth (at intervals of s years), COB = Country of birth (Australia, English speaking country, non-English-speaking country). All estimates are weighted and adjusted for survey design, with the exception of the number of cohorts and average cohort size (which are unweighted). Panel rho: 'true' correlation between residuals of Year 1 and Year 2 income model (see eqn. 3 in main text). Cohort R 2 : adjusted R 2 of the regression of income on cohort variables.

Synthetic panel estimates of joint and conditional probabilities, by analytical choice
The 'leading case' estimates are case #1.