Estimating poverty for refugees in data-scarce contexts: an application of cross-survey imputation

The increasing growth of forced displacement worldwide has brought more attention to measuring poverty among refugee populations. However, refugee data remain scarce, particularly regarding income or consumption. We offer a first attempt to measure poverty among refugees using cross-survey imputation and administrative and survey data collected by the United Nations High Commissioner for Refugees (UNHCR). Employing a small number of predictors currently available in the UNHCR registration system, the proposed methodology offers out-of-sample predicted poverty rates that are not statistically different from actual poverty rates. These estimates are robust to different poverty lines, perform well according to targeting indicators, and are more accurate than those based on asset indexes or proxy means tests. They can also be obtained with relatively small samples. We additionally show that it is feasible to provide poverty estimates for one geographical region based on existing data from another similar region.


Introduction
The sharp growth in the global count of forcibly displaced people during the past decade has created new challenges for host governments and aid organizations that will require a new approach to the measurement of poverty. 1 Host governments are keen to know the number and status of refugees living in their countries, as they struggle to maintain internal order while assisting the newcomers. Humanitarian organizations charged with managing displacement crises are confronted with increasing financial needs and, when these needs are not met by donors, with budget cuts and a shift from universal assistance to means-tested targeting. The increasingly protracted nature of displacement also challenges development organizations to design sustainable poverty reduction programs for displaced people and host communities. For all these actors, measuring poverty among displaced populations has become a key ingredient of any effective economic policy. It also becomes increasingly clear that achieving the SDGs (Sustainable Development Goals) number 1 goal of poverty reduction will not be possible if the forcibly displaced are excluded from the count.
Measuring poverty among refugees is not an easy task. It is more complex than for regular populations because refugees are more mobile. They also live in areas that are often difficult to reach due to environmental or security barriers. Indeed, the global count of the poor excludes, for the most part, displaced populations because these populations are not usually captured by censuses and, as a consequence, are largely excluded from consumption surveys, the main instruments used to measure poverty. The various challenges related to micro survey data collection, such as survey administration, sampling, and questionnaire design or funding, are exacerbated for displaced populations and will require years of efforts to meet the poverty measurement standards that we are now accustomed to seeing in (most) low-income countries. Not surprisingly, studies on refugee poverty are very rare. Refugee studies tend to either focus on the impact of refugees on host communities (see, e.g., Verme and Schuettler (2021) for a review) or on the impact of various policies including aid on refugees (see, e.g., Alix-Garcia et al. (2019) or Alloush et al. (2017)).
Organizations such as the United Nations High Commissioner for Refugees (UNHCR) and the World Bank are now fully committed to bridging this data gap, but past experiences with measuring poverty in low-income countries suggest that this is going to be a long-term process. For example, the UNHCR has attempted to collect consumption data for the Syrian refugees in Jordan using large-scale surveys that interview as many as 5000 households per month (or 60,000 households per year). In other refugee contexts, where fewer resources and more logistical challenges exist, such large-scale surveys may not be feasible or sustainable. 2 In this paper, we make several new contributions to the poverty measurement literature. First, we address the data challenge in the refugee context by demonstrating that it is feasible to apply cross-survey imputation to obtain poverty estimates for refugees. In particular, we combine census-type administrative data that have no consumption measures with consumption household survey data collected by the UNHCR on Syrian refugees in Jordan in 2014. We subsequently employ a recently developed cross-survey imputation method to estimate poverty among these refugees. To our knowledge, this is the first experiment of its kind. 3 Poverty studies that make use of cross-survey imputation methods have now become more frequent (see, e.g., Dang et al. (2019) for a recent review), but none of these studies has shed any light on refugee populations.
Second, we show that it is feasible to provide imputation-based poverty estimates for one geographical location based on the imputation model from another. This question has more practical relevance than one might think. It is well known among survey practitioners that data may often not be collected for a location due to reasons beyond one's control, such as inaccessible roads due to various forms of unexpected natural calamities (i.e., flood, storms or landslides), or conflict and violence. In the context of refugees, aside from these occurrences, even temporarily volatile security situations may also result in data not being collected for specific locations. Or it can simply be that prohibitively expensive survey costs prevent data collection at a specific location. In these cases, if the welfare variable exists for another geographical location that is comparable to the location without these data, we can employ our proposed technique to provide imputation-based poverty estimates for the latter location.
Finally, we provide theoretical and new empirical evidence that relatively small survey samples can be combined with those from the census-type registration system to provide updated estimates of poverty. Moreover, our imputation models are rather parsimonious and use variables that are already available in the UNHCR's administrative database, which is consistent with the findings in recent studies for imputation-based poverty estimates for regular populations.
Our findings show that the imputation-based poverty estimates are not statistically different from the non-predicted consumption-based poverty rates (henceforth, the "true" poverty rate), and even fall within one standard error of the latter in quite a few cases. This result is robust to various validation tests, including alternative poverty lines, disaggregated population groups, and different modelling assumptions. Furthermore, these poverty estimates are found to have smaller standard errors than other poverty measures based on asset indexes or proxy means testing. They also perform better than average for standard targeting indicators such as coverage and leakage rates.
While our estimation results are encouraging, a note of caution is necessary. Our study focuses on Syrian refugees in Jordan because the data available were particularly suitable to test the methodology proposed. It is clear that validating this methodology will require further supportive evidence from other countries, refugee groups, or other sources of data. However, if our proposed imputation method is further validated, it can offer a cost-effective and logistically efficient way to obtain poverty estimates in data scarce environments.
The remainder of the paper consists of four sections. We discuss in the next section the basic theory and analytical framework. We subsequently provide in Sect. 3 the country background, a description of the data, and the empirical results for imputation for the whole population and from one geographic location to another. This section also offers various robustness tests to alternative poverty lines, disaggregated population groups, and a stronger modelling assumption. Section 4 discusses further methodological challenges related to survey sample sizes, and other related poverty measures such as asset indexes, proxy means tests, and targeting ratios. We conclude in Section 5.

Analytical framework
Where consumption data are either incomparable across two survey rounds or missing in one survey round but not the other, but other characteristics ( x j ) that can help predict consumption data are available in both survey rounds, we can apply surveyto-survey imputation methods. These methods are mostly built on Elbers, Lanjouw, and Lanjouw's (2003) seminal study that imputes household consumption from a survey into a population census to measure poverty, which is commonly known as "poverty mapping." Various studies subsequently adapt this approach to implement survey-to-survey imputation for poverty estimates, such as Christiaensen et al. (2012) for China, Kenya, the Russian Federation, and Vietnam and Mathiassen (2013) for Uganda. 4 In this paper, we apply Dang et al. (2017) imputation framework, which builds on the earlier survey-to-census imputation approach (Elbers et al. 2003;Tarozzi 2007) to provide poverty estimates for Jordan. Compared to previous studies, Dang et al.'s (2017) method provides a more explicit theoretical modeling framework, with new features such as model selection and standardization of surveys of different designs (e.g., for imputing from a household survey into a labor force survey). This technique has recently been applied (and validated) using multiple survey rounds from different countries such as various African countries, India, Tunisia, and Vietnam (Beegle et al. 2016;Cuesta and Ibarra 2017;Dang and Lanjouw 2018;Dang et al. 2019Dang et al. , 2021. We briefly describe this imputation method below before discussing its extensions to the refugee context. Let x j be a vector of characteristics representing the main observable factors that determine a household's consumption, where j indicates the survey type. More generally, j can indicate either another round of the same household expenditure survey, or a different survey (census), for j = 1, 2. 5 Subject to data availability, x j can include household variables such as the household head's age, sex, education, ethnicity, religion, language (i.e., which can represent household tastes), occupation, and household assets or incomes. Occupation-related characteristics can generally include whether the household head works, the share of household members that work, the type of work that household members participate in, as well as context-specific variables such as the share of female household members that participate in the labor force, or some variables at the region level. Other community or regional variables can also be added since these can help control for different labor market conditions. The following linear model is typically employed in empirical studies to project household consumption on household and other characteristics ( x j ): where cj is a cluster random effects, j is the idiosyncratic error term, and y j is household consumption typically modeled in log form. Note that we suppress the subscript that indexes households to make the notation less cluttered. 6 For convenience, we also refer to the survey that we are interested in imputing poverty estimates for as the target survey (j = 2), and the survey that we can estimate Eq. (1) on as the base survey (j = 1). The former survey is usually more recent (or offers more disaggregated information, as in the case of a census) and has no consumption data, while the latter is usually older and has consumption data.
Assume that the explanatory variables x j are comparable for both surveys (Assumption 1), Dang et al. (2017) define the imputed consumption y 1 2 as (1) y j = � j x j + cj + j 5 More generally, j can indicate any type of relevant surveys that collect household data sufficiently relevant for imputation purposes such as labor force surveys or demographic and health surveys. 6 Conditional on household characteristics, the cluster random effects and the error terms are usually assumed uncorrelated with each other and to follow a normal distribution such that cj |x j ∼ N(0, 2 j ) and j |x j ∼ N(0, 2 j ) . While the normal distribution assumption results in the standard linear random effects model that is more convenient for mathematical manipulations and computation, it is not necessary for this type of model. As can be seen later, we can remove this assumption and use the empirical distribution of the error terms instead, albeit at the cost of somewhat more computing time. and estimate it as where the parameters ′ 1 (and the distributions of 1 and 1 ) are estimated using Eq. (1), and ∼ 1,s and ∼ 1,s represent the s. th random draw from these estimated distributions, for s = 1,…, S. Using the same notation as in Eq. (3), the poverty rate P 2 in survey (or period) 2 and its variance can then be estimated as The intuition behind this poverty imputation method is that we predict the consumption variable in the target survey based on the estimated consumption parameters (and the error term) and their distributions using Eq. (1). Once we obtain the predicted (distribution of the) consumption variable, we use it to estimate the poverty rate as in Eq. (4).
The variance for the estimated poverty rate in Eq. (5) consists of two components, one is the sampling error (i.e., first term on the right hand side), and the other the modelling error (i.e., the second term on the right hand side). If the regression model has a good model fit, the sampling error is likely larger than the modelling error. Notably, the variance V(P 2 ) is related to Rubin's (1987) variance formula, except for a component due to simulation errors in his formula. 7 For this reason, Dang et al. (2017) recommend using a large number of simulations to make this component negligible. We follow their recommendation and use 1,000 simulations (i.e., S = 1000) to obtain our estimates. We also provide robust standard error for the estimated poverty rate P 2 by clustering the standard error at the district level.
For imputation on two surveys that are implemented in two different periods, Dang et al. (2017) make an additional assumption that the changes in x j between the two periods can capture the change in poverty rate in the next period (Assumption 2). Since we use administrative and survey data that were collected by the UNHCR in the same year, this assumption can be modified as changes in x j between the two data sources can fully capture any difference in the poverty rates estimated from these data sources (Assumption 2'). But as discussed later, since the household survey data is a subset of the administrative data, Assumption 2' is satisfied by design in our case. In summary, Assumptions 1 (and 2') are practically equivalent to, but somewhat more relaxing than, the assumption that the distributions of j , cj , and j are the same for both the administrative and survey data. 8 As discussed in Dang et al. (2017), while we can specify Eqs. (1) and (2) as a simple OLS model (i.e., with the random effects j being subsumed into the error terms), modelling the random effects explicitly helps improve the precision of estimation results. The random effects model offers an advantage over the OLS model by capturing the between-cluster variations thanks to the additional information offered by the cluster random effects. Put differently, j is instrumental not only in estimating β j but also for our estimates of poverty in survey 2 as a component of the predicted household consumption. Also different from the traditional econometric model that estimates the impacts of x j on y j , our focus is on predicting y j conditional on x j . 9 As such, worries about endogeneity of x j pose far less important, if any, concerns in our context.
It can also be useful to note that in contexts where there are few explanatory variables x j that are comparable between the two surveys (say, when we impute from a household consumption survey into a labor force survey), the role of the random effects j is even more important. In this case, explicitly modelling the random effect term j can help better control for the larger variations due to the unobserved cluster characteristics that are not available in both surveys. Indeed, empirical evidence from various countries including Jordan and Vietnam suggests that the estimated variance of j tends to be larger where the regression based on Eq. (1) has lower goodness-of-fit (i.e., a lower R 2 ) (Dang et al. 2017(Dang et al. , 2019. We provide a more detailed description of the imputation procedures and the user-written Stata routine in Appendix 1, Part A.
For the purpose of testing this method, we use administrative and survey data that cover the same households which can be matched with unique identifiers. This allows us to split the sample artificially, simulate a cross-survey imputation exercise, and compare predicted poverty with true poverty. This is an ideal data scenario that provides the conditions for a rigorous test of the cross-survey imputation model proposed.

Country background and data
The Syrian refugee crisis is one of the largest refugee crises ever recorded in history if we consider the number of displaced people relatively to the population of the country of origin and the countries of destination. The crisis started in the spring of 2011 following clashes between protestors and government forces in several major cities and quickly descended into a complex civil war. By 2014, 6.7 million people had been displaced internally in the country, about 1.5 million people fled the country with their own means, and an additional 3.7 million people were hosted as refugees mostly in neighboring countries. As a result, about half of the Syrian population was considered displaced in 2014. For some countries, Syrian refugees also represented a major population shock. In 2014, Syrian refugees accounted for about 20% of the population of Lebanon and about 10% of the population in Jordan. The incidence of such immigration for these countries is among the highest ever recorded in history (Verme and Schuettler 2021).
The UNHCR has the mandate to protect and assist refugees in host countries and its role in the aftermath of a crisis is to find shelter, provide food and cash assistance and assist with basic services such as health and education. In order to provide these services, the UNHCR employs a system of mandatory registration for all refugees or asylum seekers requiring assistance that implies the collection of personal information. All individuals seeking protection, assistance and refugee status are expected to register with the host government or the UNHCR and, for this purpose, the UNHCR maintains a profile Global Registration System (proGres). This system contains biometric and socio-economic information on asylum seekers and refugees and serves the purpose of identifying the persons most in need and determining the type of protection and assistance they require. ProGres does not offer information on income, consumption or expenditure but contains a rich list of variables that are potentially closely associated with these monetary indicators. This proGres registration system is the most comprehensive database on refugees in any country where the UNHCR manages the registration of refugees. 10 This is the case of Jordan, the country we consider in this paper.
In addition to the registration system, the UNHCR conducts sample surveys and home visits for a variety of purposes, such as protection of different categories of vulnerable populations or assistance of targeted programs such as the cash or food assistance program. In the case of Jordan and the Syrian crisis, the UNHCR and the World Food Program (WFP) have been conducting a variety of surveys as well as extensive home visits that allowed researchers to analyze refugee conditions as had never been done before.
The paper uses two data sets: the Jordan proGres registration system (PG for short) as of December 2014 and the Jordan Home Visits survey, round II data (HV for short) collected between November 2013 and September 2014. Both data sets were provided by the UNHCR in the context of the joint World Bank-UNHCR study on the welfare of Syrian refugees (Verme et al. 2016). These comprehensive data sets have the distinct advantage that they can be linked by a common identification number. We can therefore trace the same individuals and households across the two sources of data for the same period of time.
The proGres registration system is what we consider the "census" of refugees. This data set has no information on consumption but contains socio-economic characteristics for all registered individuals and households. Variables available in the PG data include, among others, date of birth, place of birth, gender, date and reasons of flight, arrival date in Jordan, registration date, ethnicity, religion, education, professional skills, and occupations in the countries of origin and asylum.
The HV data have been collected in successive rounds since 2013 for the purpose of targeting refugees with cash assistance programs and they contain information on income and expenditure as well as a large set of individual and household socioeconomic characteristics. Although this is not a sample survey, for the purpose of this study we will consider this data set as our hypothetical sample survey. The HV data we use cover about one-third of all registered persons in Jordan in 2014 and are a sub-sample of the PG data. Our experiment is restricted to households present in both data sets, a total of approximately 40,000 households. For these households, the socio-economic characteristics of the household and its members are the same by design. This data setup practically implies that Assumption 2' is not needed in our validation context. In fact, when households are interviewed during the home visits, the variables that are common in HV and proGres data sets are expected to be updated in proGres if these variables are outdated.
As unit of observation, we use what the UNHCR refers to as the "case." A case is a group of individuals who register at the UNHCR together with a Principal Applicant (PA) who takes responsibility for the group. This group may be a family, a household or an extended household. For simplicity and practical purposes, we will consider a case and the PA as a household and its head respectively. The poverty line used is 50 JD/month/person, which is what the UNHCR used in 2014 to select beneficiaries of the cash assistance program. In 2014, this poverty line was higher than the international poverty line and lower than the poverty line used for the Jordanian population. In our case, this poverty line is more relevant than either the national or international poverty line, as it corresponds to what the UNHCR-the UN agency specialized on refugees-considers a sufficient amount to meet basic needs. As for the welfare aggregate, we use the same aggregate used by Verme et al. (2016), which provides detailed explanation of the consumption aggregate.

Imputation for the whole population
For the purpose of this paper, the HV data are considered the "survey" data containing information on consumption and the PG registration data are our "census" data containing predictors of consumption but no consumption data. The primary objective of the exercise is, therefore, to test how accurate the estimated poverty figures are using the HV data alone (as both the base and the target survey).
As a first step, we generated two samples by extracting 50% of observations from the HV sample randomly (sample 1) and using the remaining observations as second sample (sample 2). We then impute from sample 1 to sample 2 to obtain the imputation-based poverty rate in sample 2, and we compare this imputed poverty rate with the true poverty rate that can be directly calculated from sample 2 for validation purposes. We also implement this imputation process the other way around by imputing from sample 2 to sample 1 and then compare with the true poverty rate in sample 1. Naturally, given that the two sub-samples are extracted randomly from the same original sample, we should expect these sub-samples to exhibit small differences and provide similar estimates. 11 In the next section, we will perform additional tests using samples with higher degrees of heterogeneity. We consider three model specifications based on different sets of regressors for further comparison. Specification 1 employs the variables that are only available in the PG data set (PG-specific variables), which include case (household) size and the PA's demographic and employment characteristics (age, gender, different levels of education achievement, occupation group, marital status, and the governorate or city of original residence in the Syrian Arab Republic). 12 Specification 1 also includes variables related to the PA's immigration status such as the type of border crossing point and the legal status of entry. It is the main model specification. Specification 2 adds to specification 1 several variables that are only available in the HV data and that are related to home ownership, household assets, utilities, and the physical characteristics of the house. These variables include whether the house is rented or owned, the quality status of the kitchen, electricity access, and the ventilation system, the living area of the house (as measured by the number of square meters per person), whether the house is made of concrete, and the availability of tap water and piped sewerage system. Specification 3 further adds to specification 2 HV-specific variables related to the household's shock-coping strategies (i.e., whether receiving humanitarian assistance, help from the host family, or from the host community), whether the household has a valid certificate of asylum, and whether the household receives UNHCR financial assistance.
We are particularly interested in examining whether adding HV-specific variables to the main specification in specification 1 can improve the accuracy of the estimates. If we find that some key predictors of household expenditure-that are not available in the PG data-can improve the accuracy of the poverty predictions significantly, this provides a strong argument for collecting this information upfront when refugees are first registered. Vice versa, if poverty estimates imputed with the PG data are not statistically different from the true rates (i.e., those produced directly from the HV data), this would suggest that existing PG variables are already suitable to produce reliable poverty estimates.
We also use two alternative models to estimate regression errors: one where we assume a standard normal distribution for the error term, and another where we remove this assumption and use the (non-parametric) empirical distribution of the error term instead. If the error term is not distributed normally, our poverty estimates would be biased, and a non-parametric model based on the empirical distribution would likely perform better. Table 1 presents the summary results and Appendix Table 1 in Appendix 2 provides the full regression results. Table 1 shows that all the estimates using the normal linear regression model fall within the 95% confidence interval (CI) of the true poverty rate, for both sample 1 and sample 2. In other words, these estimates are not statistically significantly different from the true poverty rates reported at the bottom of the table. Estimates using specifications 2 and 3 with more variables on household assets and house characteristics are somewhat better and closer to the true poverty rate than those using specification 1 for both samples. For example, the poverty estimate using specification 1 (Table 1, first column) is 52.6%, which is 1.1 percentage points larger than the true poverty estimate of 51.5%. The poverty estimate using specification 3 (Table 1, third column) is 52.3%, which is 0.8 percentage points less than the true poverty estimate. This is likely because imputation models that include household assets are usually found to perform better than those that do not (Christiaensen et al. 2012;Dang et al. 2019). 13 The alternative imputation model based on the empirical distribution of the error terms (Table 1, row 2) performs better than those based on the normal linear regression, although both methods provide estimates within the 95% CI of the true poverty rates. In addition, for both samples, while specification 2 still performs 13 The district random effects j are estimated to have a variance of 0, which indicates that it has no contribution to the model fit (Appendix 2, Table 2.1). More generally, adding more control variables does not necessarily lead to a better model fit. While this result may appear counter-intuitive, one possible reason is that doing so may overfit the data and thus does not offer more accuracy, which is shown with empirical evidence from India and Jordan (Dang et al. 2019). A recent theoretical study also suggests that for misspecified regressions, adding more variables may result in larger inconsistency (De Luca, Magnus, and Peracchi 2018). Also note that the standard errors around the true poverty estimates are larger than those for the imputation-based estimates, since the latter are model-based. slightly better than specification 1, specification 3 now performs somewhat worse than specification 1. Yet, since the standard error around the true poverty rate is 2.3 percent for Sample 1 and 2.6 percent for Sample 2, all these differences are in fact still within one standard error of the true poverty estimates. As such, statistically speaking, the differences between the three specifications and the true poverty rates for both samples are negligible. Finally, since the HV data set is originally a nonrandom subsample of the PG database, we also re-run Table 1 using only variables that are available in the HV data set. The estimation results, shown in Appendix Table 2 in Appendix 2, are very similar to those in Table 1.
In summary, the set of variables available in the PG registration data seems sufficiently powerful to predict the true poverty rate with a 95% accuracy level. This is very encouraging considering that these variables were not selected for this purpose when the registration system was designed.

Imputation from one geographical region to another
We turn next to examining the situation where the consumption data (or the welfare variable) are not available for a particular geographical location, but are available for another similar location. The control variables x j , on the other hand, are available for both locations. Making similar assumptions, but for two locations instead of two data sources, we can employ the same imputation technique to impute from one location to the other to obtain poverty estimates. 14 We consider two such governorates (regions) in Jordan, the Balqua governorate and the Irbid governorate. The Syrian refugees in these two governorates have very similar consumption levels (i.e., around 150 JD/month/person) and poverty rates (i.e., 51-52%). t-Tests suggest that the x j characteristics are similar mostly for the case sizes and for some, but not all the, other variables (Appendix 2, Appendix Table 3). 15 As such, it can be an empirical question where we can impute from one governorate into another in a similar manner to the imputation exercise with the two samples in Table 1. Notably, in a real-life setting where we do not have consumption data for one region (but say, know from older data that the two regions have comparable income and poverty levels), it is even more important to rely on the assumption of similar x j characteristics between the two regions. Table 2 shows that estimates are somewhat less accurate when we impute from the Irbid governorate into the Balqua governorate, but still fall within the 95% CI of the true poverty rate. On the other hand, all estimates for the Irbid governorate are within one standard error of the true poverty rate. Estimates for two other governorates with similar levels of consumption and poverty, Ajloun and Jarash, also perform quite well and fall within one standard error of the true poverty rates (Appendix 2, Appendix Table 4).

Robustness checks and extensions
This section provides several robustness tests and extensions for the results presented in Table 1. We offer estimates for different poverty lines, more disaggregated population groups, and alternative estimation methods.

Sensitivity to the poverty line
One important question relates to the performance of the model specifications when the poverty line and the poverty level change. With the poverty rate close to 50%, we have half of the sample below and half above the poverty line. But estimating poverty accurately when the poverty rate is around 5-10% may be more difficult. In Fig. 1, we used variations of the poverty line ranging from 0 to 60% of the population (i.e., 0 to 60th percentile of the consumption distribution) to reproduce poverty estimates using imputations from sample 1 to sample 2 and the two models described. The results show that with a low poverty line and a low poverty rate, the empirical errors model is more accurate in estimating true poverty than the normal linear model, while the normal linear model performs somewhat better when the poverty line and the poverty rate are high. So both methods result in predictions that are within the 95% CI of the true values, but these two methods slightly differ in accuracy as the poverty line and the poverty rate change. Estimation results are Table 1 Predicted poverty rates for Syrian refugees based on imputation, ProGres and HV Data 2014 The full regression results are provided in Appendix Table 1, Appendix 2. Specification 1 employs variables from the ProGres database only, and specifications 2 and 3 employ variables from both the ProGres and HV databases. The estimation sample is generated by splitting the data into two random samples named sample 1 and sample 2. The imputed poverty rate for sample 1 and sample 2 are shown in the first and second three columns, respectively. The true poverty rate for each sample is shown at the bottom of the similar if we impute from sample 2 to sample 1 (Appendix Fig. 1). A possible explanation is that, as the number of poor households (sample size) increases, the distribution of the error term approaches a normal distribution. Therefore, as a rule of thumb, we should expect the normal linear model to perform better with larger samples.

Disaggregated population groups
The next question is whether the results are sensitive to changes in the specified population groups. We know from our regressions that the most important predictor of poverty is case size (see also Verme et al. 2016). If the prediction capacity of the model specification is sensitive to changes in household characteristics, changing case size would likely have the most impact. We impute from sample 1 to sample 2 and re-estimate poverty for each of the case sizes. To ensure that the estimation sample size is reasonable, we combine all the cases with eight or more individuals into a single group (which makes up roughly 6% of the estimation sample). We employ the two error estimation models and plot the estimated poverty rates against case size in Fig. 2. Both methods provide similar results and both sets of results are within the 95% CI of the true values. In this case, we do not observe any sharp difference between the two error estimation models. As before, we repeat the exercise imputing from sample 2 to sample 1 (Appendix Fig. 2) and find that the results are virtually Table 2 Predicted poverty rates for Syrian refugees based on imputation for two different regions, Pro-Gres and Home Visit Data Specification 1 employs variables from the ProGres database only, and specifications 2 and 3 employ variables from both the ProGres and HV databases. The estimation sample is restricted to the Balqua and Irbid regions. The imputed poverty rate for these two regions are shown in the first and second three columns, respectively. The true poverty rate for each sample is shown at the bottom of the unchanged. Given the association between case size and poverty, both estimation models seem to perform reasonably well.

Models with a stronger parametric assumption
One alternative approach to the present poverty estimation models is to directly run a probit or logit model on poverty status rather than a linear model on expenditure (and subsequently convert the predicted expenditure into poverty estimates). In this case, the population is first divided into poor and non-poor groups using the poverty line and this variable is then used as the dependent variable in a logit or probit model to predict poverty. The difference with a probit (or logit) model is that we need to make a stronger parametric modeling assumption on the dependent variable, which can result in more accurate estimation results if this assumption is correct. But the disadvantage with such models is that estimation results may be worse if the modeling assumption is violated. Furthermore, the conversion of the continuous expenditure variable into a binary variable indicating poverty status can result in loss of information and generally less efficient estimation (Ravallion 1996). Indeed, Appendix Table 5 in Appendix 2 shows that while the estimates using the probit and logit models are still within the 95% CI of the true rates, they are somewhat less accurate than those obtained using the empirical errors model in Table 1. For example, the estimated poverty rate using specification 1 and sample 2 for the logit model is 53.1%, which is 1 percentage point larger than the corresponding figure of 51.8% for the empirical errors model (compared with the true poverty rate of 51.6%).

Challenges for applications in other contexts
The data on Syrian refugees in Jordan that we analyze are of relatively high quality in the context of refugee populations. In this section, we discuss methodological challenges in other contexts where data quality may not be as good and some potential for applying our method to other contexts with similar data.

Small survey sample sizes
One practically relevant question is how large the imputation sample should be to obtain accurate poverty estimates. 16 On the one hand, a large sample size can provide estimates with more accuracy and generally better statistical properties than a small sample size; but on the other hand, it is also more expensive and demands more logistical and technical resources to implement. A balance should be reached between these trade-offs. In most conflict situations, however, the logistical and technical constraints may pose especially severe challenges for data collection efforts.

Fig. 2 Predicted poverty rates for different population sub-groups
16 Note that this challenge of finding an appropriate sample size is in the context of predicted values based on regression models, which is different from calculating the sample sizes for other purposes, such as hypothesis testing. For the latter, see, e.g., Cohen (1988) for a textbook treatment. Park and Dudycha (1974) offer some theoretical guidance on selecting the appropriate sample size for obtaining regression-based prediction estimates. In particular, we want to find the sample size n such that where 2 is the maximum (or true) multiple correlation coefficient (R 2 ) possible for Eq. (1) in the population, and 2 c is the correlation between the predicted value using Eq. (1) and the original y variable. 2 c is usually referred to as the squared cross-validity correlation coefficient. 17 A good sample size would ensure that the probability of obtaining an estimate within an acceptable error interval ( ) around 2 has reasonably good power ( ). In other words, after we specify some (acceptable) values for and , the sample size n that satisfies Eq. (7) can be derived as follows: where 2 is the noncentrality parameter for the noncentral Student's t distribution with p-1 degrees of freedom associated with Eq. (7), and p is the number of predictors (i.e., explanatory variables) in the estimation model. We provide a more detailed description of Park and Dudycha's (1974) derivations in Appendix 1, Part B.
We apply Eqs. (7) and (8) above and calculate the sample sizes where ranges from 0.01 to 0.05, and ranges from 0.90 to 0.99. 18 These ranges should cover most of the cases of interest, with a smaller value for and a larger value for requiring a larger sample size. In particular, the smallest sample size given these values would be where and are respectively 0.05 and 0.90, or the probability that 2 c falls within a bandwidth of 0.05 around the true value of 2 is 0.90. Increasing this probability to, say, 0.95 and tightening to 0.02 would require a larger sample size. We also assume that 2 is 0.45 and the number of predictors p is 27, which are the parameters obtained under specification 1 for sample 2 in Table 1. The estimates provided in Table 3 suggest that the minimum sample size is 389 observations (where and are respectively 0.05 and 0.90), and a reasonably good sample size is 1,068 observations (where and are respectively 0.02 and 0.95). Table 3 also indicates that the largest sample size required to increase to its maximal value of 0.99 and reduce to its minimal value of 0.01 is 2,509 observations. While Park and Dudycha's formulae provide useful theoretical guidance on the appropriate sample size, these formulae were originally developed for the simple OLS model. As such, their model does not explicitly take into account the cluster (8) n = 2 1 − 2 random effects model. Thus, it remains an empirical question whether these formulae can apply to our context.
We address this question and show estimation results in Fig. 3. The estimates in this figure are restricted to sample 2 from which 10 sub-samples of different sizesincluding 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000, and 5000 observations-have been extracted randomly. The first five samples represent situations ranging from less than the theoretical minimum sample size (200) to less than the theoretically ideal sample (1000), and the last first five samples represent situations ranging from the theoretically ideal sample (1,500) to a common and reasonably good sample size in practice (5000). Specification 1 is then re-run on each sub-sample, the underlying regression results are provided in Appendix 2, Appendix Table 6.
The results show that almost all the poverty estimates fall within one standard error of the true poverty rate, and that there appears no strong relationship between the number of observations and the accuracy of the results. 19 Yet, plotting all the estimation results with the normal linear and empirical errors models in Fig. 3 yields two additional observations. The first is that estimates fluctuate less around a sample of 1000 observations with both estimation methods, and the second is that the normal linear model tends to overestimate the true value more than the empirical errors model. 20 We can also observe from Appendix Table 6 that the estimated R 2 of the model specifications tends to decline and also stabilize as the number of observations increases, which is consistent with the well-known statistical result that estimates for R 2 in smaller samples may be larger than their population counterparts (see, e.g., Pituch and Stevens (2016)). In essence, good estimates can also be obtained with very small samples but samples of medium size, around 1,000 observations in our case, seem to offer reasonably stable estimates while containing survey costs. This sample size is also consistent with the theoretical results offered in Park and Dudycha (1974).
These results have practical relevance. The HV data used in this study were collected with field visits that covered about 5000 households per month, or 60,000 households per year. We have shown that covering about one-sixtieth of this number, or 1000 households per year, may be sufficient to provide reliable poverty statistics. 21

3
Estimating poverty for refugees in data-scarce contexts:…

Related measures of poverty
How does our proposed poverty imputation method compare with alternative estimation methods such as asset (wealth) indexes and proxy-means tests? We examine in this section each of these two alternatives, together with the related exercise of targeting. This is a particularly important question for the UNHCR, which uses asset indexes to measure well-being in place of consumption in many places where consumption is not available. Other development organizations such as the WFP also often employ asset indexes to target food assistance programs for Table 3 Theoretical sample size as a function of the population parameters Estimates are based on the formulas provided in Park and Dudycha (1974). We use the given parameters, the R. 2 value of 0.45, and the number of predictors of 27 under specification 1 from Table 1 Epsilon Gamma

Asset index
We consider a variant of Eq.
(1) where the left-hand side variable, household consumption y j is now missing but we have data on household assets a j , which is a subset of x j . Still, we want to generate a wealth index w j which offers the best combination of (the elements of the different) household assets a j . Suppressing the household index to make the notation less cluttered, this can be expressed as follows where is the (vector of) weights we place on the a j to generate the wealth index w j . A common way to derive is through Principal Component Analysis (PCA), another way is just to sum up all the assets available in a j . We briefly describe here a couple of reasons that make asset indexes more likely to result in biased estimates of poverty. First, the wealth index w j does not include the non-asset components, which is equivalent to the well-known issue of omitted variable bias. Second, 1 and are generally different from each other, since the estimator for maximizes the variance in a j , while the estimator for maximizes the variance in y j . 22 Finally, in a refugee context, the temporary nature of displacement likely affects refugees' behaviors in terms of accumulation and use of assets. For example, refugees may choose not to invest as much in high-quality durables as regular households do. This practical aspect may further make assets (alone) an even less reliable data source for poverty estimation in a refugee context. Table 4 provides an illustrative example where we generate the wealth (assets) index using both the simple counting method (Table 4, model 1) and the PCA method (Table 4, models 2 and 3) on the two samples. Each cell in the first five rows shows the proportion of each quintile of the consumption distribution that is correctly captured by each quintile of the wealth index. In other words, the five quintiles provide five different slices of the consumption distribution. The list of assets for model 1 and model 2 include the status of the kitchen, electricity, ventilation system, whether the house is made of concrete, and the availability of tap water and piped sewerage system. Model 3 adds to model 1 the house size and the condition of household furniture.
Consistent with our earlier discussion, the quintiles based on the wealth index can only capture between 12 and 35% of the corresponding quintile based on the consumption distribution. For example, the poorest wealth index quintile in model 3 can correctly capture only 32% (34%) of the poorest consumption quintile in sample 1 (sample 2). The correlation between asset indexes and household consumption is not (9) � a j = w j 22 See Rencher (2002, pp. 389) for a graphical illustration of the general difference between principal component analysis and OLS methods, and Dang et al. (2019) for further discussion on asset indexes. very strong, ranging between 0.21 and 0.23. 23 These are half as strong as a correlation of roughly 0.44 and 0.48 (respectively for specification 1 and specification 3 in Table 1) between the original household consumption and the predicted consumption obtained from our method. This provides supportive evidence for our earlier discussion that asset indexes may not be good predictors of household welfare and poverty, particularly in a refugee context.

Proxy means test
Most of the estimates based on proxy means testing start from a general equation that can be described as follows: where the vector of coefficients p j is obtained from the regression using another survey (see, e.g., Coady et al. 2014;Ravallion 2016;Brown, Ravallion, and van de Walle 2018). As such, proxy means tests are rather similar to the poverty imputation model expressed in Eq. (1) in terms of the deterministic part p ′ j x j,p . Yet, one key difference between the two methods is that the error terms cj + j in Eq. (1) are often omitted in Eq. (10). Consequently, the mean and the variance of the predicted consumption based on proxy means testing would likely provide biased estimates of household consumption. Even when x j,p is identical to x j -or when the error terms ( cj + j ) are negligible-there is no bias in the estimated mean consumption, but there is still bias in the estimated variance. 24 Table 5 provides poverty estimates using the proxy means test method as in Eq. (10). A couple of remarks are in order to illustrate the results. First, the estimates fall outside the 95% CI of the true poverty rate for both samples, which suggests that the error terms cj + j in Eq. (1) are not negligible. On the other hand, consistent with our theoretical discussion above, the standard errors for the poverty estimates in Table 5 range from 2.5 to 2.9%, which are roughly 10 to 25% larger than those based on the poverty imputation methods shown in Table 1.

Targeting ratios
The importance of modeling the error terms can be further appreciated when we estimate such targeting ratios as the percentage of the poor population that are correctly identified (i.e., coverage rate) and the percentage of the population identified as poor who are not poor (i.e., leakage rate). Note that just as with the poverty rate, (10) y p j = p � j x j,p 23 These correlation coefficients between the wealth indexes and consumption are weaker than those observed in Filmer and Scott (2012) for 11 other countries around the world (which range from 0.39 to 0.72 for these countries). Indeed, assets may capture different aspects of household welfare other than consumption, which could result in the weak correlation between the wealth indexes and consumption. 24 Dang et al. (2019) offer more detailed discussion and more formal proofs of these results. we need to do multiple simulations to estimate these targeting rates. In particular, the formulae for the coverage rate and the leakage rate are as follows: where I(.) is the indicator function, "|" inside the parentheses is the conditional operator, and the subscript i indicates households.
Estimates based on the empirical errors model, shown in Table 6, suggest that Specification 1 can provide a reasonable coverage rate of 70%, and a leakage rate of roughly 32%. As we add more control variables to this specification, these rates unsurprisingly improve. In particular, the coverage rate increases by almost 4 percentage points, while the leakage rate decreases by 3 percentage points when we switch from Specification 1 to the richer Specification 3. These rates compare favorably with recent estimates of the coverage rate and leakage rate of 64% and 31%, using the proxy-means test for a similar poverty rate of 40% for nine African countries (Brown et al., 2018).

Potential application to other settings
The methodology proposed by this paper can be replicated in most countries hosting refugees. As an example, Table 7 reports proGres data on nine refugee-hosting countries in Sub-Saharan Africa including Cameroon, Chad, the Republic of Congo, the Democratic Republic of Congo, Ethiopia, Kenya, Niger, Rwanda, and the United Republic of Tanzania. Some of these countries such as the Republic of Congo, the 2i,s ≤ z 1 |y 1 2i,s > z 1 ) Democratic Republic of Congo, and Chad are countries that typically suffer from lack of quality data. This table reports the numbers of observations and the percentages of total frequencies for eight key variables that can generally be used in estimating the household consumption models (i.e., Eq. (1)). Almost all these variables in the nine countries considered have sufficient observations to be used in modelling except for a few countries where these variables are understandably under-covered or non-existent (e.g., occupation in DRC or ethnicity in Rwanda). Table 7 also shows the latest available refugee survey for each country, which collects information on case size and socio-economic characteristics of the PA, in addition to other characteristics. For all these countries, the latest surveys covering refugees are quite recent, ranging from 2017 to 2020. Since the proGres data are administrative data and are updated for all these countries on a continuous basis, our proposed imputation method may be applied in all the listed countries to fairly recent data. In fact, a first experiment in that direction has been implemented for Chad with rather encouraging results (Beltramo et al. 2021).

Conclusion
We provide a first application of survey imputation methods to obtain poverty estimates for the Syrian refugees living in Jordan. Our results show that imputationbased poverty estimates are statistically not different from the non-predicted consumption-based poverty rates, and this result is robust to various validation tests. These estimates are found to perform better or have smaller standard errors than other poverty measures based on asset indexes or proxy means testing, and our The full regression results are provided in Appendix Table 1, Appendix 2. The estimation sample is generated by splitting the data into two random samples named sample 1 and sample 2. We then impute from sample 1 to sample 2 and vice versa to obtain the imputed poverty rate for each sample. The true poverty rate for each sample is shown at the bottom of the imputation models are rather parsimonious and use variables that are already available in the UNHCR's global registration system. These encouraging results are consistent with the findings in recent studies for imputation-based poverty estimates for regular populations.
The estimation results also point to the need for further research on an alternative and promising method of obtaining poverty estimates for refugees where it is expensive or logistically challenging to implement a large-scale survey. We provide both theoretical and empirical evidence for Jordan that relatively small surveys may be fielded for refugees, and data from this survey can be combined with those from the census-type registration system to provide cost-effective and updated estimates of poverty. While these results are encouraging, they are not definitive and should be replicated in other contexts, possibly using surveys that have a more detailed consumption module. If further validated in other contexts, including some sub-Saharan countries with available and similar ProGes data on refugees, these findings can potentially lead to significant reductions in data collection costs in the context of refugee operations. Table 6 Coverage and leakage rates based on imputation, ProGres and Home Visit Data The full regression results are provided in Appendix Table 1, Appendix 2. Specification 1 employs variables from the ProGres database only, and specifications 2 and 3 employ variables from both the ProGres and HV databases, using the empirical errors model. The estimation sample is generated by splitting the data into two random samples named sample 1 and sample 2. The imputed targeting rates are obtained using the empirical errors on sample 2. Robust standard errors in parentheses are clustered at the district level. We use 1000 simulations for each model run