1 Introduction

The debate on—and use of—poverty lines for policymaking and academic research has a long history. The first income poverty lines were developed by local government officials working for school boards in the major cities of England in the early 1870s (Gillie, 2008). The idea of the poverty line was popularised by Charles Booth (1893), although he did not invent this concept (Gillie, 1996). Similarly, Rowntree (1901) costed a full subsistence budget standard designed to provide the minimum necessary to maintain ‘physical efficiency’. However, it is important to note that both Booth and Rowntree used these income poverty lines purely as a heuristic device. The poor were mainly identified in both studies as those in obvious want and squalor, i.e. based on the opinions of survey enumerators and school board visitors about the families’ living conditions (Gillie, 1996; Stone, 1997). Discriminant analyses of a subset of the data in Booth’s notebooks demonstrated that the number of rooms (overcrowding) and the subjective assessments of school board visitors about a family (e.g. ‘poor but honest’) were of significantly more importance for classifying a family as poor or otherwise than any estimate of the family’s income (Bales, 1994).

There have been always been questions ‘How much is enough?’ and ‘How can such criteria be validated?’ (Townsend & Gordon, 1993), yet over a century after the research of Booth and Rowntree, poverty research still lacks an uncontested, systematic and clear answer to the question about how to find and identify the optimal poverty line, i.e., where to draw a line between the poor and not-poor groups. This problem remains irrespective of the poverty measure that is used (e.g., income, expenditure and/or observed deprivations). In multidimensional poverty measurement, two main approaches are employed to aggregate different domains: intersection (i.e., lacking resources AND being deprived) and union (lacking resources OR being deprived) (Boltvinik, 1999). There are, however, variants to these approaches, such as weighted combinations of income and deprivation or those based on one exclusive domain (uni-domain) within which different dimensions of deprivation exist (Alkire & Foster, 2011).

All these aggregation approaches face the challenge of providing a sound theoretical and empirical basis for the chosen poverty line that should, at least, fulfil one normative and two statistical criteria. From the normative point of view, the key concern is about the meaning of the poor and not-poor groups. For example, according to Townsend (1979), the optimal division should split the population into two distinct groups where the poor are those whose resources are so low that they are in fact excluded from ordinary living patterns, customs and activities. From the statistical perspective, the poverty line must neither underestimate nor overestimate poverty and it must minimise false positive and false negative rates.

One source of confusion in the poverty literature has been the different labels attached to the poverty line and the relevant domain to which each label is applied. In the classical works on income poverty, the poverty cut-off was referred to as ‘Z’ (Atkinson, 1987; Foster et al., 1984), but sometimes, in deprivation-only measures, Z was used as a cut-off for transforming ordinal or nominal variables into a binary indicator (Alkire & Foster, 2011). In the most recent literature, k is defined as the threshold to identify the poor in both weighted union approaches and on indices based only on deprivation indicators (Alkire et al., 2015). Intersection approaches rely on two cut-points, with k being the deprivation value at which low living standards poverty is identified, whilst Z is the income poverty line.

A range of options for selecting an optimal cut-point have been proposed in the literature. For example, normative practices choose a threshold ‘k’ based on policy standards or on a theoretical argument, for example, the official Mexican multidimensional poverty measure, which combines low income and a social rights index (intersection). For the income aspect, it uses budget standards to identify Z and a cut-point of one or more social deprivations (i.e. k = 1) on the normative basis that no human/social rights should be violated (Coneval, 2011). Similarly, deprivation-based measures, such as UNICEF’s right-based approach to poverty measurement, use k = 2 in order to err on the side of caution in ensuring that the deprivations are a result of low income/resources, rather than some other reason (e.g. discrimination, ill health) (Gordon et al., 2003). Other approaches lack clear theoretical guidance or a normative basis to set k and instead rely on sensitivity analyses that look at the effect of different ks on the poverty rate (Alkire & Foster, 2011). However, it is unclear how sensitivity analyses results can be used to identify the optimal poverty threshold that reduces false positive–negative rates and results in the classification of meaningful groups. With regard to income (Z), others use arbitrary thresholds such as < 60% of median equivalised household net income in Europe for the official AROP (At Risk Of Poverty) measure (Guio et al., 2016, 2017) or an essentially arbitrary calorie intake threshold (e.g. 2100 kcal per capita or adult equivalent) as references for calculating the household level food poverty expenditure line (Ravallion, 1998). Z might also be derived via budget standards approaches that assess the level of consumption that is needed to achieve a specific standard of living (Bradshaw, 1993; Bradshaw et al., 1987) or a healthy lifestyle (Deeming, 2005).

Empirical approaches aim to infer the poverty cut-point from the observed data. This might be purely in an exploratory manner or by drawing on a theory to find k. An example of a theory-based approach is the Bristol Optimal Method (BOM), which draws on Townsend’s theory of relative deprivation (see next section for an explanation) (Gordon, 2006; Townsend, 1979; Townsend & Gordon, 1993). The BOM examines the relationship between income and deprivation using generalised linear models (GLMs) to find the best possible grouping for different ks, given Zs (Gordon, 2010; Gordon & Nandy, 2012). The BOM requires both income and deprivation. However, in many applications, income is missing from the data and the BOM cannot be fully implemented. Babones et al. (2015) proposed using a Poisson-based framework to find k, with one of the more attractive properties being that this method does not rely on income or expenditure to find the optimal cut-point. Latent class analysis (LCA) and factor mixture models have also been applied to understand multiple deprivation and indirectly find k by looking at the resulting classes, which are often more than two (Moisio, 2004; Najera, 2016; Whelan & Maitre, 2005). More recently, Notten and Kaplan (2022) proposed another method to assess false-positive and -negative rates as means to reduce the candidate thresholds to identify poverty. Similarly, Nájera (2023) relies on Bayesian estimation methods to estimate classification errors which, in turn, could be used to approximate the best empirical split.

Having empirical criteria to inform the selection of poverty cut-off is vital for poverty research. Existing empirical approaches offer the possibility of reducing the vagueness of arguments in favour of or against certain values of k. However, all these approaches are based on a range of assumptions and their relative reliabilities have not been assessed. Thus, it is open to question under what circumstances these methods can be trusted or disregarded. Furthermore, other potential alternatives have not been applied in poverty research and it is unknown whether they might be useful for finding k and cross-validating any existing alternatives.

The objective of this paper is to assess the circumstances under which a range of methods offer a useful tool for consistently finding the best k value. The paper focuses mainly on poverty measures that use an intersection criterion to classify the poor and non-poor populations but the results are also relevant for measures based purely on deprivation scores. Measures based on union approaches, i.e., those that combine three domains, are not the main objective of the simulation study. The paper is organised as follows. The next section briefly reviews the history behind the idea of a poverty line, whilst the following section reviews the key theoretical arguments for the existence of a poverty threshold which justifies the idea of a poor group. The third section describes the methods used for the analysis and discusses their assumptions. Section 4 presents the results of the study and Sect. 5 provides an example based on real data. The final section discusses the findings of the study.

2 Theoretical Basis of the Poverty Line

Poverty has been defined as the lack of command of sufficient resources over time, the outcome of which is social and material deprivation (Gordon, 2006; Townsend, 1979). According to this theory of relative deprivation, people with low levels of resources, primarily income, are likely to also have low living standards. The relationship between resources and deprivation is not perfect, as both would not be expected to completely overlap (Halleröd, 1995; Perry, 2002; Saunders & Adelman, 2006; Whelan et al., 2004). This is a result of the dynamic relationship between deprivation and resources as well as measurement limitations, i.e., problems with data collection, particularly income (Gordon, 2006). A household’s expenditure may reflect its permanent income (i.e., expected long-term average income) rather than current income (Friedman, 1957). Thus, changes in income are likely to have a lagged impact rather than an immediate one upon a household’s living standards (Katzman, 2000). The timespan will depend on different factors (ability to incur debt, social policy, intra-household transfers, social security replacement rates, etc.) but, from a theoretical point of view, the relationship between low resources and deprivation holds.

According to the theory of relative deprivation, the relationship between income and deprivation is not linear. Townsend (1979, 1987) suggested that, as resources fall, there is a level at which deprivation increases dramatically relative to an additional unit fall in resources. Furthermore, according to Townsend, this level represents the scientific/objective poverty line and is one of the few frameworks that provides an explicit hypothesis about how to distinguish the poor from the not-poor by testing empirical relations. Townsend (1979, p. 261) attempted to assess this hypothesis via a visual inspection of the relationship between income and a deprivation index score (sum or count of deprivations) and he found a level of income at which the number of deprivations increased dramatically, i.e., given small changes in resources, we see a high risk of having multiple deprivations. This kind of relationship has been found in many surveys in many countries and, although it originates from the relative deprivation theory, it has been used for other approaches that rely on the intersection approach (UK, Europe, Mexico, Benin (wealth index), respectively) (Gordon & Nandy, 2012; Guio et al., 2012; Halleröd, 1995; Nandy & Pomati, 2015; Townsend & Gordon, 1993).

During the late 1980s, the existence of the ‘Townsend break point’ optimum poverty threshold was widely debated and assessed using different empirical approaches (Callan et al., 1993; Desai & Shah, 1988; Hutton, 1991; Mack & Lansley, 1985; Piachaud, 1987). The main argument against this hypothesis is that ‘living standards’ are a continuum and that any cut-off would be essentially arbitrary and likely to be the result of a statistical artefact (Mack & Lansley, 1985; Piachaud, 1987).

However, most empirical studies have supported the existence of such an inflection point, by using either visual or econometric methods (Desai & Shah, 1988; Hutton, 1991; Townsend & Gordon, 1993). Townsend and Gordon (1993) provided more robust evidence of the existence of such a threshold, whilst most recent studies have continued to support Townsend’s original idea (Tomlinson et al., 2008). Ferragina et al. (2013) examined the specific association between income and participation, with their findings suggesting the existence of a break point. Latent variable approaches seem to offer similar conclusions. Item response theory (IRT models), for instance, which assesses a continuous latent construct, tends to show a break in the item response curves, i.e., the severity parameter (location on the x-axis) is similar for items with high severities and then there is often a gap between these high-severity deprivation items and deprivations with a lower severity of poverty (Guio et al., 2016; Szeles & Fusco, 2013). The existence or lack of a break point in the context of an IRT model nevertheless remains unclear, as this has not yet been treated as a research question worth analysis. However, IRT models confirm that reliable deprivation indices tend to consist of a few items that measure ‘mild’ poverty (i.e., that lie between zero and one standard deviations in living standard below the average person), a larger number of items that measure moderate to severe poverty (i.e., between one and three standard deviations) and only a few or no items which measure very severe poverty (i.e., three or more standard deviations below the average living standard). Thus, the average person experiences no or only minor levels of deprivation but, as living standards fall, the number of deprivations increases rapidly—as measured by a reliable deprivation index. Thus, although living standards may, in theory, be continuous, deprivation is not so as it is a measure of low living standards only. Consequently, a break point or poverty threshold is an inherent property of the deprivation index methodology.

Empirical support for the existence of a poverty line should not be confused with the assessment of the location of the line itself. In other words, there is a difference between finding that a curve bends and finding where exactly this happens. For instance, poverty research assumes that there is a line from which the poor and the not-poor can be adequately distinguished but proposals regarding where to draw this line are fairly new, not widely adopted and still lack rigorous empirical scrutiny. The hypothesis behind the existence of a poverty line justifies the concept of a group of poor people who can be distinguished from the rest of the population in a given society. The location of the poverty line will affect the prevalence of poverty. The following section describes a number of proposals that have been suggested in the literature to identify the optimum poverty line.

3 Methods Employed to Find the Poverty Line Under the Intersection and Deprivation-Only Approaches

Several empirical methods in the literature assess cut-points, given a univariate or a joint distribution: latent class analysis, cluster analysis, discriminant analysis, receiver operator characteristic (ROC) curves, classification error models and univariate analysis. This paper is concerned with a subset of methods that have been proposed to find k in order to determine a cut-point under the intersection approach (typically characterises the consensual approach but is used in other applications) and/or for finding a cut-point when only deprivation data are available. In particular, we focus on the comparison of the Bristol Optimal Method with a proposal made by Babones et al. (2015), namely the use of count data distribution (Poisson). This comparison is here expanded by including negative binomial distributions, zero-inflated models and univariate mixture models. The paper is unable to cover other types of comparisons, such as classification error models proposed, for instance, by Notten and Kaplan (2022) and Nájera (2023), as this would require a slightly different type of simulation study or latent variable models—as this was the basis for the simulations.

3.1 The Bristol Optimal Method (BOM)

The BOM draws on Townsend’s theory of relative deprivation and on the hypothesis that there is a break point in the income distribution from which deprivation increases substantially (see Sect. 1). Therefore, the method relies on both income and the deprivation index score to find k. The origins of this method can be traced back to the first visual attempts to find the inflection point (Townsend, 1979). The way in which the inflection point has been assessed has changed over time thanks to increases in computing power and improvements in statistical methods. For example, Townsend (1979) visually inspected graphs, Mack and Lansley (1985) used correlation coefficients and Townsend and Gordon (1993) used cluster and discriminant analyses.

The current BOM was developed to analyse the 1999 Poverty and Social Exclusion (PSE) survey (Pantazis et al., 2006). The method uses generalised linear models (GLMs), namely ANOVA and logistic regression, to assess what grouping in the deprivation distribution provides the best split (k) between the ‘deprived’ and the ‘not-deprived’ (Gordon, 2010). The GLM approach uses, as a response variable, groups of people for different ks (0 v 1 + , 0–1 v 2 + , 0–2 v 3 + , etc.) and income as a key explanatory variable (plus control variables to adjust the estimates). The ANOVA approach sets the data the other way around, with the response variable as income and the key explanatory variable as a binary variable of groups of people suffering from x number of deprivations. A series of models are fitted and then the model with the best fit is chosen to provide the k value that best groups the population into poor and not-poor classes. Once k has been identified, the Z cut-off can be set by looking at the corresponding value on the income distribution. In this way, the BOM jointly defines both k and Z under the intersection approach. Figure 1 illustrates this approach, where a high deprivation score represents a low standard of living:

Fig. 1
figure 1

Conceptualisation of the relationship between deprivation and resources (approximated with income). The figure represents a double cut-point: k on the standard of living domain and Z on the resources domain. The optimal split leads to a meaningful identification of the poor, relative to the not-poor

3.2 Poisson-based Framework

Babones et al. (2015) assume that a deprivation score follows a Poisson process, as it is a count variable that empirically tends to be skewed towards the right. Every observed Poisson distribution has a theoretical counterpart and any discrepancy between either suggests that there is a mismatch between the number of expected cases and the actual figure. In practice, this approach uses the observed deprivation score and then fits a—theoretical—Poisson distribution considering the parameters of the observed one. Next, the misfit between the theoretical and the empirical distribution is assessed to find the best possible split. Hence, this is not a GLM that uses a Poisson link. For Babones et al. (2015), when the empirical distribution is higher than the theoretical one, they assume that the number of deprived people is greater than would be expected and, therefore, they should constitute the ‘poor’ group. This approach also assumes that the mean values of the deprivation score and the variance are the same (i.e., there is no over-dispersion)—an assumption that is unlikely to hold in practice. When the mean is different from the variance, the theoretical Poisson distribution is likely to fail, and consequently the k value will be incorrect. Babones et al. (2015) did not use confidence intervals to assess significant differences between the observed and the theoretical distributions.

This paper used the R-package ‘fitdistrplus’ to fit the Poisson distribution (Delignette-Muller & Dutang, 2015) and estimated k using confidence intervals. The fit of the theoretical distribution relative to the observed data plays a decisive role in understanding the performance of the Poisson-based framework when seeking to identify the poverty line. In essence, the better the fit, the more likely it will find the true k. However, this is a guideline and not something that has been formally addressed or proven (see next section). The performance of this approach is assessed using the following statistics: chi-square, Akaike information criterion (AIC), Bayesian information criterion (BIC), the G-information test (Sokal et al., 1969), the Kullback–Leibler (KL) divergence statistic for discrete distributions (MacKay, 2003) and the James–Stein Shrinkage (JS) estimator (James & Stein, 1961). Both KL and JS, where computed, using the ‘entropy’ R-package (Hausser et al., 2009).

3.3 Negative Binomial-based Framework

This is a more general framework that draws on the work of Babones et al. (2015). The Poisson distribution is a special case of negative binomial distribution and a negative binomial model has one parameter more than a Poisson model, which adjusts variance independently from the mean. Thus, the negative binomial framework assumes that the discrepancy between the theoretical distribution and the empirical data identifies the poor group, albeit it does not assume that variance is equal to the mean. Therefore, this is a more general framework of analysis, and it should—in theory—be more successful in determining k. The R-package ‘fitdistrplus’ was utilised to fit the negative binomial distribution and the zero inflated model, where applicable (Delignette-Muller & Dutang, 2015). The performance of this framework is also assessed by using the statistics of fit.

3.4 Univariate Mixture Approach

Mixture models are model-based clustering methods employed to discern subgroups of individuals even in the absence of a variable that indexes everyone to a subgroup. Univariate mixture approaches are helpful in examining the existence of sub-populations within a distribution (McLachlan & Peel, 2004). Implementation of mixture analyses is fairly recent, and most of the available software has coded mixture analysis for normal distributions and some semi-parametric cases. In this instance, the EM algorithm-based implementation of the R-package ‘mixtools’ (nor- malmixEM2comp, function) was used (Benaglia et al., 2009), which assumes the existence of two components (groups) within a univariate normal distribution. This assumption is not ideal, however, given that deprivation scores tend to be skewed. Nevertheless, mixture approaches are still hard to implement and this offers a simple framework to start exploring this as a potential modelling strategy. From a conceptual perspective, k should lie at the point at which the two mixtures are distinguishable, i.e., where the number of people in distribution ‘b’ (poor) is higher than the number of people in distribution ‘a’ (not poor), which will be the point where the two mixtures/groups overlap.

Mixture 1: When the cumulative probability of the poor group exceeds the cumulative probability of the not-poor group, i.e., the point at which the lines cross and the poor have a higher probability than the not-poor given k (see Figs. 8 and 9, below). For example, if at k = 3 the cumulative distribution of the poor is higher than the cumulative distribution of the not-poor, this means that, for this particular value, people are more likely to be poor than not poor.

Mixture 2: When the cumulative probability of the poor group is zero, i.e., the point at which it is very unlikely that the poor group exists for a given k. This is a conservative test that is likely to underestimate the number of poor people, i.e., the mixture that fully covers the poor population with no overlap with the not-poor mixture distribution.

4 Simulation Design and Strategy

This section presents the simulation strategy to produce poverty indices. The first subsection describes step-by-step how a realistic deprivation index was created and then how the poor and not-poor group were defined (given a k value). The second subsection is concerned with the different conditions under which each method was assessed, i.e., different sample sizes, number of indicators, weak or stronger relationship between resources (approximated via income) and deprivation, which is at the heart of Townsend’s theory (Fig. 2).

Fig. 2
figure 2

Clear Townsend’s break point k = 2. High τ µ intensity. This figure was produced from simulated data

A Monte Carlo study based on a series of factor mixture models (FFMs) was utilised in order to assess the reliability of existing methods to determine the poverty threshold. The strategy to simulate the poor and the not-poor populations consisted of the following four steps. All simulations were conducted using Mplus 7.2 (Muthén and Muthén, 2012).

Step 1. The first step consisted of producing a reliable poverty measure, following the methodology of Guio et al. (2016) and the Monte Carlo settings of Nájera (2019), to produce reliable measures (thus, this study does not provide answers for unreliable poverty indices). For this purpose, a two-parameter item response theory model was used to simulate the poverty indices (Hambleton & Swaminathan, 1991). Two-parameter IRT models have been used in poverty research to assess the measurement properties of an index (Guio et al., 2017, 2018; Najera, 2016; Szeles & Fusco, 2013). A two-parameter model permits the estimation of both the severity of each indicator and its capacity to discriminate between the poor and the not-poor (see Harris, 1989, for a simple explanation). A number of ‘i’ items (i.e., binary deprivation indicators) with different severities were simulated to produce the poverty measure. The i number varied in order to test how each method performs for a different number of deprivation items (see below) which, in turn, guaranteed the simulation of item characteristic curves, This was seen not only with different locations (severities) and different slopes (discrimination) but also within reasonable limits, i.e., matching values from the empirical literature (severities between − 1 and − 3, i.e. low standard of living; discriminations > 0.4 (std loadings) based on Guio et al. (2016) and Nájera (2019).

Step 2. The second step introduced a mixture component to the two-parameter IRT model. A factor mixture model (in this case an IRT model) was used for this purpose (Lubke & Muthén, 2005, 2007) in order to produce the poor and not-poor groups. These were defined by setting different probabilities of being deprived of the ‘j’ items. These probabilities were set in accordance with the severity of deprivation, i.e., the more severe, the higher the chances that the poor group would lack the item relative to the not-poor group.

Step 3. The third step established a break point in the latent class probabilities (i.e., k). The items were ordered according to their severity (from low to high) value and then a very low probability of endorsement was assigned to the not-poor group, given k. For example, if the simulated measure had nine items and k was set equal to 3 + , the probability of endorsing items 3 through 9 were very low for the not-poor and high for the poor. This is consistent with the empirical evidence in which the probability of endorsement seems to vary by severity and by the level of resources (Deutsch et al., 2015; Najera, 2016). For example, the Mexican official poverty measure suggests that the not-poor are extremely unlikely to lack the severest deprivation items, such as basic education, adequate flooring, etc. and more likely to lack the least severest deprivation items, such as access to social security.

Step 4. The last step established a break point in the income distribution. Different correlation levels between each class, item and income were utilised for this purpose, which reproduced the theoretical association predicted by Townsend between resources, severity and multiple deprivation (see Sect. 1). Items with high severity were associated with low-income levels and vice versa. Next, the mean income of the poor and the not-poor classes was set at different levels whereby the poor always had a lower income (see below). If k was 3 + , then a sharp drop in the mean income was introduced between items 2 and 3, following which the decrease in income was compressed (i.e., a lower decrease rate). Changes to correlation levels and mean incomes were applied to assess how the BOM and other methods performed when the non-linear relationship was weak and very weak (see below).

Differential weighting (i.e., one deprivation rendered more important than another by inflating it with a constant) was not considered when generating the poverty indices. This was unnecessary as the simulated indices are all highly reliable (i.e., highly correlated with the true poverty score) and differential weighting, even if it were done perfectly, would thus add little additional information to the index, i.e., reliable indexes are self-weighting (Gordon et al, 2012; Nájera, 2019). All the simulations assumed that missing data are random, and income had been correctly equivalised (i.e., adjusted for economies of scale by household size and composition). This paper therefore does not consider how methods that rely on income are affected by using different equivalence scales, as this is beyond its scope. It also does not consider the effect that missing data might have on the results.

4.1 Changing Simulation Parameters

The simulations considered the basic conditions that have been observed to change across samples and years in poverty studies (see Guio et al., 2017, 2018). A mixture of the following changing parameters was simulated in the data:

  • Sample size (n): Three sample sizes were considered (n = 500, n = 1000, n = 10,000).

  • Number of items (ı): the number of indicators considered for producing the deprivation score (i = 8, i = 15).

  • Different k: Different k values were considered (k = 1, k = 2, k = 3, k = 4, k = 5). Deprivation scores tend to be skewed towards the left, i.e., zero-inflated or one-inflated in developed countries and less skewed in less developed countries.

  • Different poverty rates (ψ): Low, medium and high poverty prevalence rates (p = 15%, p = 30%, p = 45%).

  • Break point for income and deprivation (τ intensity): A negative relationship between income and deprivation was modelled and two cases were considered: A clear break point τ >  = 1.4, meaning a change in the mean fall in income (Fig. 1), which follows Townsend’s break point hypothesis and low tau intensity (unnoticeable break point) τ >  = 0 to τ < 1.4, meaning that the mean fall in income was virtually the same before and after the break point (Piachaud’s contention (Piachaud, 1987)).

  • Mean income difference (µ split): This parameter changes how strong the mean difference is in income between the two groups. High split µ > 1.3 means that the average income of the not-poor is at least 30% higher than the income of the poor group. Low split µ <  = 1.3 indicates that the mean difference is less than 30%.

  • Jumps in the deprivation score at kk − 1): Poisson, negative binomial distributions and univariate mixture models are likely to be sensitive to changes in the shape of the distribution and, more importantly, to differences between the observed and expected values (Poisson and NB). For example, if at true k there is a large drop of δk − 1, the NB and Poisson are likely to fail, as the observed values are likely to be lower than the expected ones. To assess how sensible these methods are, a sharp increase is followed by a series of changes of the same magnitude (monotonic increases). The following situations were considered: a) Low decrease from k−1 to k (Poisson process) (δk − 1 < 0.10) and b) Sharp decrease (δk − 1 >  = 0.10)

Multiple combinations apply to the above parameters, and it is rather difficult to grasp the most important differences across the simulations. Table 1 summarises the different characteristics of the simulated poverty indices. The simulations can be classified into five main ordered types (A, B, C, D and E), where A fulfils Townsend’s breaking point and E is the hypothetical case of a deprivation index with a rather weak relationship with resources. Each simulation falls within each one of these five categories. However, due to the stochastic nature of the exercise, some can be classified into more than one group.

Table 1 Summary of the qualitative aspects of the simulated models and its parameters

This study draws on Townsend’s hypothesis of the existence of a breaking point on the distribution of resources from which deprivation raises substantially. This is represented in Fig. 2, where each dot represents the mean value of the resources of each deprivation group. In this case, the poverty line would be equal to k = 2, with a sharp decrease in income (tau) and a substantive difference in income. It is important to keep in mind that k is an unobservable parameter and therefore the usefulness of parameters such as δk − 1 is limited in practice. The example using a real data from Mexico illustrates this limitation (Sect. 4.1). A total of 1,000 replications were used for the Monte Carlo experiment.

5 Results

The results of the study are organised according to the k used for the simulations (from k = 1 to k = 5).Footnote 1 Table 2 shows the results for the set of simulations using k = 2 as the true split, n = 10,000 and poverty rate ψ = 15%, ψ = 30% and ψ = 45% (the results for the other sample sizes are included in the “Appendix 2”). The first three columns illustrate values for parameters δk − 1, τ and µ. Models M1 8 2 to M3 8 2 (the notation means: first three models—M1 to M3—with eight deprivation items and k = 2) were fitted using different τ and µ, ranging from a clear break point to an almost straight line. These parameters are relevant for the BOM, as they modify the relationship between income and deprivation. The other two models, with eight items M4_8_2 and M5_8_2, change the value of δk − 1 from very low to low. It is expected that this will favour both NB and Poisson, as the observed number of cases increases when δk − 1 approaches zero. This sequence is replicated for the models with 15 items (M1 15 2 to M5 15 2), the first three of which assess the break point and the others δk − 1 (change in the shape of the distribution). Table 1 also displays the suggested value for k when the model failed to find the true k. The proportion of fails is displayed in the error rate column, i.e., 0.7 means that 70% of the models failed to find the true k.

Table 2 Results for true k = 2, n = 10,000, ψ = 30%, ψ = 45% and ψ = 15%

Table 1 suggests that the BOM (logit and ANOVA) works well when the breakpoint is reasonably high (τ > 1.3), whilst ANOVA becomes rather unreliable for indistinct breakpoints and low mean differences. Both the Poisson and the negative binomial fail when the drop (δk − 1) from k−1 to k is high (> 0.1). This is because the expected number of cases remains higher than the observed and overestimated k (see the last row of simulations). When the fall is small (< 10%), the negative binomial and Poisson seem to perform much better. Univariate mixture cluster analysis (Criterion a) performs well when the drop from k-1 to k is high (unlike negative binomial and Poisson). This method seems more sensitive to the number of items, as more items improve its performance. When this method fails, it tends to overestimate k, which is reasonable, given that the skewed distribution leads to very few cases for the poor group and the mixture fails to fit the data. All methods are sensitive to small sample sizes (see “Appendix 2”). For n = 500, all methods become unreliable but the effect is particularly acute for the Poisson and negative binomial approach as well as for univariate mixture cluster analysis.

Regarding changes in the prevalence of poverty, Figures 3 and 4 show the effects of changes to the deprivation rate for ψ = 15%, ψ = 30% and ψ = 45%. These two plots show the relationship between ψ and δk−1. The values of δk−1 will depend on ψ, which only causes a displacement of the distributions. The plots reveal that what matters is the value of the jump in the deprivation score (δk−1) and not the deprivation rate (ψ). Since δk−1 captures the effect of different ψ, the simulations considering different ψ have thus been omitted for simplicity in Tables 3, 4, 5, albeit the remaining results can be consulted in the “Appendix 2”. The conclusions are not altered by a higher poverty rate (ψ). However, especially for the univariate models, the distribution of the deprivation score is important. When deprivation scores concentrate large amounts of the population in two or more scores (Figure 4), it is likely to find two or more peaks, which could happen in countries with very high poverty and inequality or in countries where some needs (e.g., health care) are conditional on access to the formal labour market. These kinds of distributions are very likely to negatively impact the reliability of the methods. However, such especial cases where not thoroughly assessed by this study.

Fig. 3
figure 3

Displacement of curves for the same k. Deprivation rates = 15%, 30% and 45%

Fig. 4
figure 4

Displacement of curves for the same k. Deprivation rates = 15%, 30% and 45%. Positive \(\hat{\delta }k - 1\)

Table 3 Results for true k = 3, n = 10,000, ψ = 30%
Table 4 Results for true k = 4, n = 10,000, ψ = 30%
Table 5 Results for true k = 5, n = 10,000, ψ = 30%

Table 3 presents the results of the simulation considering k=3. This table is analogous to Table 2 in the sense that it uses three models to assess changes in the break point and two models to assess δk−1. When the break point is clear, the BOM performs very well. Low τ (lack of a clear break point) and µ (little income difference between the ‘poor’ and ‘not-poor’ groups) affect the BOM approach. It seems that logit is a more robust approach than ANOVA under these conditions, which is an unexpected finding, given the lower power of logit models compared with ANOVA.

The results are quite similar to the previous findings (Table 2). High δk−1 affects both count-based approaches (negative binomial and Poisson), as these models fail to detect the correct value of k. Conversely, for low δk−1, the performance of negative binomial improves. However, model M4 15 3 has some problems, as 50% of the time the negative binomial model fails to detect the correct value of k (i.e., 3). For a higher k, it seems that the drop needs to be smaller compared with the values of the drop used for k=2. The Poisson model fails under these conditions. The univariate mixture model approach works well, while the NB approach fails with a higher number of deprivation index items. The first UMA model works well when δk−1 is high. When this is not the case, the UMA approach is likely to underestimate the true value of k with model 1 and overestimate the true value of k with model 2. All methods perform poorly for small samples.

Table 4 illustrates the results of the simulations using k = 4. Overall, the results suggest that the BOM approaches are only unreliable when τ and µ are low. This result is similar to the k = 2 and k = 3 results. Logit will underestimate k and ANOVA will overestimate k in situations in which τ and µ are small, which seems to be the case for k = 2 and k = 3. The negative binomial performs well when δk − 1 is very low. As k increases, it seems that the NB becomes less sensitive, and differences between the expected number of not-poor and poor need to be very small; otherwise, NB will overestimate k. However, the univariate mixture model, which is quite sensitive to the number of items, seems to perform better when δk − 1 is high. A low number of items (in this case fewer than eight) produce convergence problems when using the univariate mixture approach.

Table 5 (below) displays the results when k = 5. The results for the BOM are the same as on previous occasions, in that only when τ and µ are very small, will logit and ANOVA fail to find k. In this case, ANOVA seems to perform a bit better. Logit models underestimate k and ANOVA overestimates it. Furthermore, the UMA models fail to produce good results for eight items, which is understandable as, for such a high cut-off (> 50% of the possible ks), it is not possible to properly fit another curve. The UMA works well for 15 items. In this case, it seems that it becomes less sensitive to low values of δk − 1 and that this is related to having a less skewed distribution because, when this is zero or one inflated, the UMA becomes unreliable. The first model (Mix1) underestimates k by a factor of 2 and the second model (Mix 2) overestimates it by a factor of 1.

Table 6 presents a crude summary of the findings of the simulations. Each row represents one of the five cases described in Table 1. It reveals that the univariate models fail in almost every situation, whilst the simulations demonstrate that it is likely that in some specific circumstances there is a chance that some of these methods will find the optimal poverty line. However, overall, the likelihood of success is rather low. The multivariate models are more likely to work well in almost all circumstances as long as Townsend’s hypothesis of the relationship between resources and deprivations holds.

Table 6 Summary of the main findings

5.1 Real Data Example

Data from the Mexican Income Expenditure Household Survey (HIES) 2012 and 2014 are used to illustrate how the findings of this paper might be applied. A 12-item valid and reliable deprivation index was computed using the official Mexican multidimensional poverty measure as a reference (CONEVAL, 2011). The measure combines income and indicators of material deprivation organised into six dimensions: social security, health care, housing conditions, availability of basic public services, education and food security.

Figures 5 and 6 plot the mean adjusted income per capita according to the deprivation scores for 2012 and 2014, respectively. Both plots show that the relationship between income and deprivation barely changed from 2012 to 2014. In both cases, Townsend’s prediction relating to the existence of a break point is clear. There are also similarities with respect to the possible cut-off at \(\hat{k}\) (Observed k). At this point, income drops less dramatically for people with more than two deprivations.

Fig. 5
figure 5

Mean income per capita by deprivation score 2012

Fig. 6
figure 6

Mean income per capita by deprivation score 2014

All methods were applied to the Mexican data. Figures 7 and 8 plot the observed deprivation score (dark bars) and the fitted (expected, grey bars) values from the negative binomial distribution for 2012 and 2014. In both years, the observed distribution is highly skewed towards the left. There is also a positive increase in the number of cases from 0 to 1 deprivation in both years. However, it is at 5+ when the observed distribution is consistently higher than the theoretical. Consequently, the \(\hat{k}\) value for both years is five. The sharp drop, from 1 to 2, results in a very low \(\hat{\delta }k\)−1 > .10. From the simulations, we know from the Monte-Carlo results that the NB methods cannot identify the correct value k under these circumstances. When the NB does not perform well, a conservative rule might be to set the threshold at \(\hat{k}\)−2 which would result in a deprivation threshold of \(\hat{k}\) = 3

Fig. 7
figure 7

Negative binomial. Fitted and observed distribution 2012

Fig. 8
figure 8

Negative binomial. Fitted and observed distribution 2014

The univariate mixture approach seems to be more consistent with the results shown in Figures 5 and 6. Figures 9 and 10 present the univariate mixture distributions for 2012 and 2014. From the simulation results, it seems that the first model works well when δk−1 is high. The simulations also suggest that the second model (Mix 2) is likely to overestimate k by a factor of 1 and that \(\hat{k}\)−1 should be applied when using these criteria. Therefore, the univariate mixture model suggests that the optimal \(\hat{k}\) should be equal to 2 for 2012 and 3 for 2014. However, it is evident that, for \(\hat{k}\) the probability of belonging to the not-poor group is quite low. As previously discussed, the univariate mixture model results should be a conservative test of the true value of k, which may underestimate the poverty rate.

Fig. 9
figure 9

Univariate mixture model results 2012. Each line denotes one of the mixture distributions

Fig. 10
figure 10

Univariate mixture model results 2014. Each line denotes one of the mixture distributions

Table 7 shows the results of the BOM along with the results of the other approaches. Logit and ANOVA lead to the same results (k = 2). In this case case τˆ and mˆ are quite high and therefore these two methods lead to the same solution. In a country such as Mexico, with large income inequality, the value of τˆ is quite high and so it is very likely that \(\hat{k}\) is equal to k.

Table 7 Real data analysis results. Mexico 2012 and 2014

6 Conclusions

Drawing upon Townsend’s (1987) theory of relative deprivation, this paper examined the reliability of some existing empirical methods to find the ‘best’ poverty threshold k for measures using the intersection approach, i.e., combining a proxy of resources with a deprivation score and for measures relying exclusively on a deprivation score. Townsend’s theory suggests that poverty is the lack of command of adequate resources over time—and deprivation is one of its consequences. Moreover, it predicts that the relationship between income (resources) and deprivation is non-linear and that there is a break point at which the latter increases substantially in line with any additional fall in the former. This point k optimally distinguishes the poor from the not-poor and therefore it represents the optimal poverty line, i.e., the line for which the prevalence of poverty is most likely to be calculated correctly.

The paper also assessed the conditions under which the Bristol Optimal Method (logistic regression and ANOVA), the Poisson-based framework, the negative binomial-based framework and the univariate mixture approaches fail or succeed in finding the optimal value of k. Drawing upon Townsend’s theory, a Monte Carlo experiment was used to assess how different sample sizes, numbers of deprivation items, strengths in the relationship between income and derivation and changes in the poverty rate affected each method.

The results suggest that the Bristol optimal method (BOM) outperforms the other methods when Townsend’s theory holds. However, if there is not a clear change in the slope at the break point and the minimal income differences between the poor group and the not-poor, logit models are likely to underestimate k and ANOVA to overestimate it. We suggest not only utilising visual inspections of the relationship between income and deprivation but also computing changes in the slope for the obtained k. When ANOVA and logit differ, the advice is to compute τˆ and µˆ for each and for the k in-between the two solutions. In these circumstances, the correct value of k may lie between the ANOVA and logit results. The results also indicate that a sample of at least 1000 households/people is recommended to conduct this kind of poverty threshold analysis.

Income or expenditure measures are not included in some surveys and, therefore, the BOM cannot be applied in these circumstances unless a proxy of resources, such as education or labour position, is used as an ancillary variable. The Poisson-based and negative binomial-based frameworks, as well as the univariate mixture approach, are some of the alternative methods that can be employed to find k. The conceptual basis of the Poisson-based approach is reasonable (Babones et al., 2015) but its assumptions are unlikely to hold, as most deprivation scores will exhibit over-dispersion (mean lower than the variance). The negative binomial distribution is less unreliable, relative to the Poisson-based approach, when attempting to find k and the results reveal that it almost always has a better fit than the Poisson-based framework. Nonetheless, neither of these two approaches is recommended for use in practice, as they are very likely to lead to the incorrect conclusion about the best possible split.

The univariate mixture approach works well when the deprivation score is not zero inflated, \(\hat{\delta }k\)−1 is high and the number of items is at least 10. This approach is useful in narrowing down the possible k, which seems likely to fall within the \(\hat{k}\) suggested by mixture model 1 (underestimation) and mixture model 2 (overestimation).

The main conclusion, overall, is that the four univariate methods are unlikely to lead to reliable results and are based on very strict assumptions. If researchers want to use these methods, at least all the precautions above need to be considered. This underlines the value of including proxies of resources (e.g. validators) whenever possible (education, occupation scales or even measures of subjective well-being) as, the more information, the better.

The following issues need to be considered in future work: (1) the effect of different equivalence scales upon income and its relationship with deprivation; (2) the existence of two (or more) inflection points and (3) sensitivity to outliers and the effect of population weights upon the estimates. In addition, population weights will affect τ, µ and δk − 1.

More generally, two main problems need to be addressed. The first is the effect of measurement errors upon classification. Hausman et al. (1998) discussed how measurement errors are propagated in linear and categorial models. Second, we need to determine how to find ‘k’ for variants of the union approach that combine two or more domains. Some multidimensional indices produce a joint weighted score. For example, OPHI-MPI uses three dimensions but produces a single deprivation score and sets k at a certain percentage of the joint distribution, thereby indicating a specific case of deprivation-only scores. One could use auxiliary data to find ‘k’ on their weighted score, using any of the analysed methods in this paper. However, this index rests on a different framework and this paper is concerned with Townsend’s theory and the empirical regularity of the relationship between resources and deprivation.

Other potential attractive methods need to be considered. Hybrid models such as Poisson-based mixture cluster analyses, Bayesian univariate mixture analysis and multivariate mixture analysis are potential candidates when there is a dearth of ancillary information (Frühwirth-Schnatter, 2006). These methods only require the observed deprivation pattern and can be implemented in open-source software like R-software. Bayesian methods can accommodate prior information on the possible split. Multivariate mixture analysis is another option for auxiliary data. Factor mixture modelling and confirmatory latent class are also other options when information exists on income and deprivation or only on deprivation. Although these methods are more useful for analysing multiple deprivation, they can also help to assess k. One of the problems in this regard, however, is that, in exploratory settings, most solutions point at the existence of more than two groups so, in these cases, a confirmatory two-class model is needed. Furthermore, approaches based on classification errors must be also taken into consideration (Nájera, 2023; Notten and Kaplan, 2022), as they focus on minimising errors as means to find the best split, which echoes the definition of differences between the poor and the not-poor.

This paper discusses the reliability of different methods for identifying the best possible poverty line/threshold. However, this does not constitute a validation of the concept of the poverty line/observed split, i.e., the extent to which the two groups have the expected theoretical characteristics. A key assumption in poverty research is the existence of two groups: the poor and not-poor. The scientific measurement of poverty therefore requires both a sound theory, stating how these groups should be identified and empirical scrutiny of the poverty line. Otherwise, the prevalence of poverty will be subject to arbitrary decisions and unscientific approaches that result in even more questionable rates of limited use for policy design and evaluation.