Use of past care markers in risk-adjustment: accounting for systematic differences across providers

Risk-adjustment models are used to predict the cost of care for patients based on their observable characteristics, and to derive efficient and equitable budgets based on weighted capitation. Markers based on past care contacts can improve model fit, but their coefficients may be affected by provider variations in diagnostic, treatment and reporting quality. This is problematic when distinguishing need and supply influences on costs is required. We examine the extent of this bias in the national formula for mental health care using administrative records for 43.7 million adults registered with 7746 GP practices in England in 2015. We also illustrate a method to control for provider effects. A linear regression containing a rich set of individual, GP practice and area characteristics, and fixed effects for local health organisations, had goodness-of-fit equal to R2 = 0.007 at person level and R2 = 0.720 at GP practice level. The addition of past care markers changed substantially the coefficients on the other variables and increased the goodness-of-fit to R2 = 0.275 at person level and R2 = 0.815 at GP practice level. The further inclusion of provider effects affected the coefficients on GP practice and area variables and on local health organisation fixed effects, increasing goodness-of-fit at GP practice level to R2 = 0.848. With adequate supply controls, it is possible to estimate coefficients on past care markers that are stable and unbiased. Nonetheless, inconsistent reporting may affect need predictions and penalise populations served by underreporting providers. Supplementary Information The online version contains supplementary material available at 10.1007/s10198-021-01350-9.


Introduction
Risk-adjustment models seek to predict the health care cost of individuals based on their observable characteristics with the practical purpose of attaching prospective health care premiums or budgets to them.
The aim of risk-adjustment formulae differs according to the structure of health care systems and their financing modalities. In competitive health care systems, they are used by social insurers, for example in Europe or in the US Medicare scheme, or by private insurers, to design payment systems that promote affordability and efficiency [1]. The aim is to equalise risk and compensate insurers for predictable variation in healthcare expenses across groups, incentivising the efficient provision of care and minimising adverse selection [2,3].
In non-competitive care systems, risk-adjustment formulae are used by governments to allocate pooled resources to local strategic purchasers, according to the need of their populations, for example, in England, Canada, Australia, Norway and Sweden [4][5][6][7][8]. Risk and adverse selection are less of a concern because budgets are ultimately attached to a population identified by the geographic area of residence, rather than individual own risk. Policy makers and researchers are instead concerned with avoiding perverse incentives, for example related to prior service utilisation, and containing and allocating the budget equitably. Equitable allocations are intended to create a playing level field based on population need, rather than existing access to care [5,8], 1 3 while responsiveness to service users' needs, efficiency and outcomes achievement are fostered through complementary policies targeting different levels and aspects of service delivery [4].
The most advanced risk-adjustment formulae rely on person-based models. Individual-level health care expenditure is modelled as a function of individual and area-level variables, which capture both legitimate and illegitimate sources of variation. Such distinction is crucial in systems where cross-subsidisation, or compensation through resource transfer, is used across payment plans or local health organisation budgets to address fairness concerns.
The choice of variables, along with the model predictive power, is very important for setting fair budgets [9]. Models should include a rich set of needs variables, which typically drive legitimate variation in health care expenditure, and for which cross-subsidisation is desirable [6,10]. Variables for which cross-subsidisation is not desirable, such as those reflecting health care supply or insurer responsibilities, should be included to obtain unbiased coefficients on the need variables. Their effect can then be sterilised, for example by fixing their levels to an average value when predicting costs, so that insurers or purchasers are equally rewarded for individuals with equal need [11][12][13][14].
The distinction between need and supply variables has underpinned the methodological development of riskadjustment formulae in non-competitive health care systems. Research on resource allocation in England is one of the most documented examples and the one with the longest tradition [4,15].
Need variables are typically basic demographic factors (age and gender) and additional personal characteristics, which improve predictive performance, and enable disentangling need from supply [4,10,16]. Need indicators should be feasible, universally and consistently recorded for the population included in a register of patients, possibly universal, not vulnerable to manipulation and free from perverse incentives [1,4].
Person-level indicators that satisfy these criteria are limited. Past care markers reported in records from previous encounters with the health system, such as recorded diagnoses and intensity of care received, substantially improve the goodness of fit [17]. For example, R-squared for diagnosesbased prospective models are 15.3% versus only 1.5% for age and sex models estimated on US commercial data [6] and 13% versus 3% in models estimated on UK population data [16].
Because the intensity of care provided and the completeness of recording may differ across providers for patients with the same intrinsic need, past care markers may reflect supply as well as need. Differences may arise from perverse responses, such as increased reported complexity in activitybased financing systems [18], or simply from variations in provider capacity. Lower capacity may lead to lower care and information reporting, especially when monitoring or incentives to record activity are weak, for example when records are not used for payments. Risk-adjustment formulae may then reward higher capacity, resulting in low capitation or premiums for those patient categories who failed to secure access to care when needed [13,14,[19][20][21].
Diagnostic variables are commonly used for risk-adjustment in competitive health care systems, but past care contacts and expenditure less so [6]. All are used in personbased resource allocation formulae in England, if the quality of administrative records is considered sufficient [16,19,22]. With a focus on general and acute care in England, Gravelle et al. [12] showed that the inclusion of morbidity variables in area level models improved the predictive power and the precision of coefficients. However, they also showed that failure to account for unobserved variation in supply may inflate the positive association between past care markers and expenditure, and in turn need predictions as a reflection of better supply. They tested the inclusion of provider effects in their model and concluded that local purchaser dummies would suffice to control for supply variation.
Gravelle et al. [12] did not consider whether their conclusions would hold with person level data, nor discussed the potential bias arising from differences across providers in reporting and data quality, as well as treatment provision. Dixon et al. [19] found that in the formula for general and acute hospital services past care markers would reflect need rather than supply, when the latter was properly accounted for. However, concerns remain on whether supply variation is sufficiently accounted for when the quality of data is particularly poor.
In this paper, we focus on the inclusion of past care markers, diagnoses and count of care contacts in the most recent formula for mental health services in England. The variability in care provision across provider, the inequity in access to care, and the concerns about the quality of administrative data, exacerbate the potential draw-backs of using markers based on past care. The development of the resource allocation formula for mental health services and the selection of variables for risk-adjustment are described in Anselmi et al. [23]. We build on their model.
We highlight the trade-off between using past care markers in a formula set to generate fair allocations, but that in practice may reflect undesired differences across providers, if they differ in diagnosing and reporting. We derive and discuss the probability limits of the coefficients for a set of alternative models in a way similar to how Gravelle et al. [12] did to illustrate the consequences of including additional controls for socio-economic status, morbidity, waiting times and provider characteristics in area level models. We then illustrate how the inclusion of past care markers improves predictive power and reduces bias in the coefficients on remaining variables. We formally assess the stability of coefficients by including additional area level measures of past care contacts with specific providers (provider effects) [24], along with fixed effects for local health organisations. This allows us to control for differences across providers serving populations within the same local health system. We assess bias and improvement in predictive power simultaneously, applying a method proposed by Oster [25].

Health care, resource allocation and risk-adjustment in England
National Health Service organisational structure and resource allocation Under the current organisational structure, the health care budget in England is allocated to local health organisations, called Clinical Commissioning Groups (CCGs). These strategic purchasers commission care from primary, community and secondary care providers. CCGs have been combined into Sustainability and Transformation Partnerships (STPs) and now in Integrated Care Systems (ICS). ICSs cover all of England and are made of NHS organisations and councils, which collaborate to improve health and care in the areas they serve. The majority of health care budget is allocated to CCGs, except for primary care and for some specialised services which are commissioned centrally. The budget for primary care is mostly allocated directly to General Practitioner (GP) practices, which provide primary care and refer patients for secondary care. Every GP practice is part of one CCG and patients register with a single GP practice.
Since 1976, weighted capitation formulae have been developed in England to ensure that the distribution of resources across local health organisations reflects relative need. A set of formulae, one for each identified funding stream, are used to estimate the determinants of health care costs and to derive need indices and need-based budget shares. These are then aggregated to determine each local health organisation's fair share of the total budget [26-28].

Risk-adjustment development and use in resource allocation
Since 2011, person-based formulae have been applied to individual-level data covering health care users and nonusers registered with GP practices [16,22]. Each formula estimates the individual cost for service use in a given year, as a linear function of person and area level need variables and of area, GP practice and local health organisations supply variables in the previous two years. The estimated coefficients are then used to predict individual cost. Importantly, the effects of the supply variables, including local organisation dummies and provider effects, are neutralised in this cost prediction by fixing the level of supply, usually at the average value across the country. The individual-level predictions, which reflect only variations in the needs variables, are then aggregated to area or GP practice level.
Person level data used in most recent formulae typically include demographic characteristics (age and gender) for each patient registered with a GP practice, as well as information from linked administrative data sources. Information on individual ethnicity, past service use and diagnoses reported during these past contacts, is linked in via an individual patient identifier, for those who have been in contact with services and for whom administrative records exist. Information on need and supply characteristics at the GP practice and small area level, which is available from different sources, can also be linked.

Mental Health risk-adjustment and data
The mental health formula that we use here for illustration was one of the formulae which informed the 2019/2020 round of budget allocations across CCGs [23]. In 2016, the year of the analysis, there were 211 CCGs responsible for commissioning from 66 providers (Mental Health Trusts) secondary inpatient and community mental health care for patients registered with the GP practices in their area. The GP practices were responsible for referring patients to these secondary mental health care providers.
A separate formula for mental health services was first introduced in 1996 [29] and updated multiple times broadening the scope and developing the methods towards a personbased formula [30,31]. The formula estimated in 2012 [31] included 43 flags for three-digit ICD10 diagnostic codes, condition severity (measured in layers of at least n contacts with n different care professionals) and care markers in the two previous years, derived from the national mental health dataset [31]. Due to concerns around data quality, past care markers were not included in the latest version of the formula that we use for illustration [23].
Mental Health Trusts report information on every care contact through the national Mental Health Services Dataset (MHSDS), previously Mental Health Minimum dataset (MHMDS) [32] and Mental Health and Learning Disabilities Dataset (MHLDDS) [33]. The MHSDS data collection started in 2003 under different names (Mental Health Minimum dataset MHMDS [32] first and then Mental Health and Learning Disabilities Dataset MHLDDS [33]). The MHSDS has been broadening in scope over time including patient characteristics, diagnoses and the quantity and type of care provided. The quality of recording is improving [34,35], but still thought to be generally low and very heterogeneous across providers [36]. This means that not only patients with the same underlying need may receive different levels of care, but also that their diagnostic and care contacts information may be reported by providers to different extents. Therefore, patients served by a provider that does not record information will appear as having no diagnoses and no multiple care contacts. For example, providers differ in clustering patients by need, but these differences are not systematically associated with any observable provider characteristic [36].

A model for mental health care cost
In line with previous work [12,16,30,31,37], we assume that the cost of mental health care is a function of observed and unobserved need and supply variables at the individual, GP practice and CCG level where C ijk is the total cost for individual i registered with GP practice j, in CCG K.
Need is captured by a set of individual variables, some observable (x 1ijk and x 3ijk ) and other not observable (x 2ijk ), and GP practice variables x 4jk . x 1ijk typically include demographic and other characteristics measurable at the individual level, while x 4jk include proxies for need not available at the individual level, such as area deprivation. For simplicity, we refer to x 4jk as GP practice variables, but area level variables are usually also included in an analogous manner.
x 3ijk include variables, such as past care markers, which are observed only for those individuals who had previously been in contact with mental health services, and for whom the information about the contact was reported in the administrative records. The observed past mental care markers may therefore depend on individual need, as well as on the propensity of a GP practice to refer patients, on the diagnostic and treatment intensity, and on the capacity and recording precision of the provider. x 3ijk can therefore be written as a function of the remaining need and supply variables: Health care supply is captured by a set of observed GP practice characteristics (z jk ) and by referral to different mental health care providers P jk . P jk is a set of provider effects (p m jk ), the proportion of individuals registered with a GP practice j who received care from provider m: (1) (3) N jk is the total number of patients registered with GP practice j in CCG k. u ijk and u m ijk are binary indicator taking value 1 if individual i has had at least one care contact with any provider or with provider m specifically in the two previous years, and 0 otherwise.
Each p m jk is the product of the proportions of individuals registered with a GP practice who had at least one mental health care contact with any provider and the proportion of them who have had at least one contact with provider m. Hence, p m jk reflects both the probability of receiving mental health care and receiving care from a specific provider m for a patients in GP practice j. As most GP practice lists include over a thousand patients, the contribution of i to p m jk is minimal, and p m jk can be considered exogenous to i. P jk is a function of GP practice need and supply variables, including a combination of unobserved characteristics of the GP practice and providers (s jk ): s jk captures both the unobserved GP practice propensities to refer patients for mental health care and to each provider m more specifically. Without affecting the illustration of the bias mechanism, we ignore for simplicity that P jk also depends on the GP practice average of individual need variables.
Whilst most GP practices would refer patients to the same mental health provider, local purchasers cover larger areas and multiple GP practices, so their patients could be referred to one or multiple providers. If all GP practice patients were in touch with the same provider, conditional on observed need, P m jk would pick-up any unobserved GP practice variation in supply. P jk control for unobserved differences in diagnostics intensity, care provision and recording across combinations of GP practices and providers (s jk ). P jk serve the role of provider dummies, but because not every patient uses the service and because patients can use multiple providers, p m jk vary between 0 and 1. K k is a set of CCG dummy variables, which account for CCG factors, such as differences in prices or service commissioning, if any, and mean intensity of care. K k account for differences in supply rather than need, and arguably, if provider effects are not included, for average differences in care provision, diagnostic intensity and data reporting across providers serving the area. ε ijk is a zero mean error uncorrelated with need, supply and CCG variables.
The constant term corresponds to a fixed average capitation rate for all registered patients, conditional on need and supply. When all variables have mean zero, the relative size of this term determines the relative size of funds that would be distributed on a purely per capita as opposed to a need basis.

Empirical models
To illustrate how omitted variable bias may affect the coefficients of the need and supply variables when past care markers and provider effects are or are not included, we estimate the following four alternative models.

Model 1: without past care markers and without provider effects
First, we consider the true cost as a function of need and supply variables, excluding past care markers and provider effects, as it is common practice in more conservative person-based models. We replace Eqs. 2 and 4 into Eq. 1 and we obtain: which we can estimate as:

Model 2: with past care markers and without provider effects
Second, we consider the past care markers as observed need variables. We replace Eq. 4 into Eq. 1 and we obtain: which can be estimated as:

Model 3: without past care markers and with provider effects
We then consider the provider effects as observed supply variables. We replace Eq. 2 into Eq. 1 and we obtain: which can be estimated as:

Model 4: with past care markers and with provider effects
Finally, we consider both past care markers and provider effects as observed variables. Equation 1 then reflects the true cost and can be estimated as a function of observed variables: The probability limits of each coefficient d , ĝ , â , ĉ are reported in Table 1 (columns 1, 2, 3 and 4, respectively). b s are the probability limits of the coefficients from the regression of the unobserved variables x 2 and s over each observed variable (in turn x 1 , x 4 , z, K), conditional to the remaining observed ones.

Omitted variable bias and its relevance for need predictions
Comparing the probability limits of the coefficients in each of the four models (Table 1) allows us to illustrate potential omitted variables bias and its direction. The coefficient on individual need variables x 1ijk in Model 1 is composed by the direct effect of x 1ijk on cost and three types of indirect effects which depend on the correlation between x 1ijk and the unobserved x 2ijk, x 3ijk, and s jk and which may generate bias: a. the indirect effect through the unobserved past care markers x 3ijk ; b. the indirect effect of the unobserved individual need x 2ijk through x 1ijk and x 3ijk , if x 2ijk was correlated with x 1ijk so that b x2x1.x4zK was not 0; c. the indirect effect of the unobserved supply s jk through x 3ijk and P jk , if s jk was correlated with x 1ijk so that b sx1.x4zK was not 0.
For resource allocation purposes, the bias would generate concerns only if b sx1.x4zK was not equal 0, so that the coefficient would pick up unobserved supply which would then inappropriately be reflected in need weights. The bias from (10) b sx1.x4zK would not arise in Model 3 and Model 4, which include P jk to further control for supply. Similarly the coefficient estimates on GP practice need variables x 4ijk , could be biased in Model 1 and Model 2 if x 4ijk was correlated with unobserved supply s jk , so that b sx4.x1zK in Model 1 and b sx4.x1x3zK in Model 2 were not 0. The bias in the coefficient estimated in Model 1 would also be a concern if past care markers were correlated with unobserved supply, so that 5 was not 0 and the coefficient would reflect the effect of unobserved supply on past care markers.
The estimated coefficients on GP practice observed supply variables, z jk , could be biased and of potential concern, in Models 1 and 3, where the past care markers x 3ijk were unobserved and correlated with z jk ( 3 4 not equal to 0). In Models 1 to 4 if supply was correlated with other unobserved individual need (b x2z not equal to 0) coefficients would also be biased. This bias would raise concerns if unobserved individual need differed systematically across areas because when predicting need systematic variations in unobserved need would be sterilised together with the effect of GP practice supply. Models 2 and 4 would have smallest bias, if b x2z was not 0, while Model 4 would have the smallest bias if also 3 4 was not 0, as the latter effect would be picked up directly by past care markers.
The coefficients on the CCG fixed effects are larger in Model 1, 2 and 3 as they pick up the average CCG effect of unobserved supply and need variables. Because CCG fixed effects are considered supply factors and their effect sterilised in need predictions, the bias in their coefficients is of potential concern only if it reflects unobserved need. This would be the case in Model 1 and in Model 3, if past care markers were associated with unobserved GP practice and CCG supply, and if 5 and 6 were not 0. If unobserved need differed systematically across CCGs, b x2K would not be 0 in Models 1 to 4 and the estimates of CCG fixed effects would be biased in every model, but with the smallest bias in Model 4.
The inclusion of the past care markers x 3ijk in Models 2 and 4 improves the precision of the coefficients on the remaining observed need and supply variables. The estimated coefficients on x 3ijk includes the direct effect of past care markers and other unobserved individual need variables. Crucially, if past care markers were correlated with unobserved supply, b sx3 would not be 0, and the coefficient estimated on x 3ijk in Model 2 would reflect supply differences, such as the propensity of different providers to diagnose, treat and record. This would not be the case in Model 4, where the provider effects would account for those supply effects.
The inclusion of provider effects P jk in Model 2 and 4 improves the precision of the coefficients on the remaining need and supply variables. The coefficient on P jk is larger in Model 3 when past care markers are not included, Error d ijk If provider effects P jk are correlated with unobserved individual need x 3ijk differing systematically at the area level, the coefficient on P jk may reflect area differences in unobserved individual need, similarly to other observed supply variables.
The size of the unexplained variation is expected to reduce from Model 1 to Model 4 as more variables are included.

Data
We used the dataset produced for the refreshment of the resource allocation formula for adult secondary mental health care in England [23]. This includes routinely available person level data on use and cost of mental health care services in 2015 linked with need and supply individual, area and GP practice variables in the two previous years (financial years, 1 April to 31 March) for all 43,750,558 patients over 20 years of age registered with a GP practice in England. Further details on data sources, extraction and linkage are provided elsewhere [23].
Person level data included mental health care contacts in outpatient settings and in the community, costed by the pay band of the professional providing care, and inpatient bed days, costed by level of intensity. Psychological therapies were included. Specialised services that are nationally commissioned (such as bed days in low, medium and high security wards) were excluded. The individual total annual cost was truncated at £100,000 affecting less than 0.01% of all individuals.
We included demographics and household composition (defined according to age and gender of individuals residing at the same address), ethnicity and physical health diagnoses associated with severe mental illness [38], which we use as observed characteristics due to reliable data quality.
Past care markers were derived from the Mental Health Services Dataset (MHSDS) and include diagnoses and intensity of previous care contacts. We defined a set of binary indicators for whether the person had been in contact with secondary mental health services and diagnosed with any of the 44 ICD-10 chapter F mental health diagnoses in 2013/14 or 2014/15. We generated a set of binary risk indicators, adapted from previous mental health formulae [31], including: the frequency of contacts with six different types of health care professionals defined by pay bands; an inpatient stay of two or more nights in a mental health Trust; a stay in a secure mental health ward; and a detention under the Mental Health Act.
We included GP practice or small geographical area (Lower level Super Output Area (LSOA)) level need variables were measured in 2013/14 or 2014/15: the proportion of the LSOA population receiving out of work benefit; a binary indicator for the GP practice serving a high proportion of students; and the GP practice prevalence of severe mental illness. Attributed supply variables included the driving time distance between the LSOA and closest Mental Health Trust headquarters.
We generated two sets of binary variables for the 211 CCGs and for the 42 higher-level STPs of which the GP practice were part. We also generated a set of provider effects, calculated as the proportion of patients registered with a given GP practice and who were in contact at least once with each of the 66 Trusts providing mental health care and reporting to the MHSDS over the financial years 2013 or 2014. As an alternative, we also calculated the two components of provider effects separately as: contact with any provider (proportion of patients registered with a given GP practice who were in contact at least once with at least one Trust) and contact with a specific provider (proportion in contact with each provider out of those patients who had at least one contact with one provider).

Model estimation
We estimated individual mental health care costs as a function of need and supply variables using a cross-section linear regression model (Ordinary Least Squares) with robust standard errors and CCG fixed effects. We included binary indicators for the interactions between gender and 14 fiveyear age bands (up to 85 +), 16 ethnic groups, 11 household types and 7 physical health diagnoses, as person level measures of need. We also included GP practice (or area) need and supply variables. The CCG dummies control for residual differences in supply.
We estimated the four models presented in Sect. 3.2. We first estimated a base model where we included all individual and area need and supply variables and CCG dummies, but no past care markers nor provider effects (Eq. 6, Model 1). We then additionally included past care markers (Eq. 8, Model 2) or provider effects (Eq. 10, Model 3), or both (Eq. 11, Model 4). For clarity of exposition we have not included time subscripts, but in the empirical models all need and supply variables were measured historically to avoid reverse causality.
For each model, we estimated the mental health care cost for all individuals registered with a random 50% of 7746 GP practices and we used the remaining observations as a validation sample for the predictions. We compared the models' predictive performance based on the coefficient of variation (R-squared) at the individual and at the GP practice level on both the estimation and validation samples. Model selection was based on: predictive power (at the GP practice, rather than person level and on both estimation and valuation samples); variation of coefficients when including or excluding variables; and redistributive effects of alternative models, in line with international practice [9]. The objective of the formula is to allocate resources at the commissioner level, but the predictive power of the model cannot be assessed at this level given the inclusion of commissioners fixed effects. The performance is therefore assessed at the GP practice level [16]. Assessment of the performance of the model in explaining individual-level variation serves to understand the effects of including different variables and to check coefficient stability.

Coefficient stability
Coefficient stability when including additional covariates is generally interpreted as a sign of limited omitted variable bias. However, small coefficient movement could be due to the low explanatory power of these additional covariates [25]. Oster [25] suggests that coefficient movements should be interpreted alongside R-squared movements and proposes a consistent estimator of the bias which accounts for both. She derives a bias correction using the coefficient and R-squared from a baseline regression without observable controls ( ̇ and Ṙ ); the coefficient and R-squared from a regression with observable controls ( ⌣ and ⌣ R ); and the R-squared from an hypothetical regression which maximises R-squared (R max ).
Although the maximum R-squared would be 1, Oster [25] recognises that in practice no regression can fully explain the observed variation and suggests using a value proportional to ⌣ R as an upper bound for R max . The estimator is derived under the assumptions of equal selection on observables and unobservables, and of equal contributions to the variable of interest and to the outcome of each observable control. Oster [25] suggests that the unbiased coefficient lies within the bounds [ β * , ⌣ ]. The smaller these bounds, the smaller the coefficient bias.
Following Oster [25], we derived the bias corrected coefficients for need and supply variables, including past care markers, as: where ⌣ and ⌣ R were derived from Model 4, which included the most complete set of observables with provider effects, ̇ and Ṙ were derived from Model 2, which included a limited set of observables and R max is the maximum R-squared attainable. Unlike Oster [25] we didn't use R max = 1.3 ⌣ R , but we set R max = 0.2753, the R-squared of the individual level regression including GP practice fixed effects (R-squared 0.2755, Adjusted R-Squared 0.2753). GP practice fixed (12) effects would maximise R-squared at the GP practice, not person, level, which is the main metric used to compare models predictive power in the Person Based Resource Allocation methodology [16]. Given the still limited set of individual need variables, GP practice fixed effects are not included in any of the models as their coefficients cannot be interpreted exclusively as need or supply, and they would prevent the inclusion of other time invariant GP practice need and supply variables.

Sensitivity to variation in provider use within purchaser
Given the number of Trusts (66 providers) compared to CCGs (211 local purchasers), one may expect little variation in provider effects within CCGs, but higher variations of provider effects within STPs (42 groups of CCGs). We tested whether results are sensitive to allowing for higher variation of provider effects within purchaser by including STP instead of CCG fixed effects and we re-estimated Models 1 to 4 as Models 5 to 8. Table 5 in the Appendix reports the number of CCGs within different STPs.

Sensitivity to variations in the definition of provider effects
The two components of the provider effects (contact with any provider and contact with a specific provider) may have different effects on mental health care cost. We tested if controlling for them separately increases precision in the estimate of the coefficients of interest. We re-estimated Models 3, 4, 7 and 8 as Models 9, 10, 11 and 12 by including contact with any provider and contact with a specific provider, instead of the provider effects originally defined as proportions of GP practice patients in contact with each provider.

Sensitivity to provider data reporting
As a further sensitivity analysis, we estimated Models 1, 2, 3 and 4 excluding CCGs where the majority of patients (> 95%) used Trusts with the lowest reporting quality, assessed based on submissions with missing data for 2013, 2014 and 8 months of 2015 [34,35].

Descriptive statistics
Out of the 43,750,558 adults aged 20 years or older, 4.01% had some contact with secondary mental health and/or IAPT services in 2015/2016. The average cost per person was £80.60, whilst the cost per service user was £ 2008.46 on average and ranged from £ 94 to £ 1,040,963 (before truncation). The descriptive statistics are reported in Table 6 in the Appendix. Table 2 presents the coefficients estimated in Models 1, 2, 3 and 4 and the Oster's bias corrected coefficient on some examples of individual and GP practice variables included in the models. The complete sets of coefficients for all estimated models are reported in Table 6 in the Appendix. As expected, compared with Model 1, the coefficients on individual characteristics in Model 2 and Model 4 are affected by the introduction of the past care markers, which now account for unobserved individual need. Coefficients are similar in Model 1 and in Model 3 after the inclusion of provider effects, except for some age groups, suggesting that unobserved GP practice need and supply are controlled for.

Importance of past care markers
Changes in the coefficients on area and GP practice need variables indicate that in Model 1 they picked-up need which is picked-up by past care markers in Model 2. The inclusion of provider effects in Model 3 reduces the coefficients on these variables compared with Model 1, where the coefficients may also pick-up the effect of some unobserved GP practice need and supply, if these are not appropriately controlled for by the remaining variables included. The inclusion of provider effects in Model 4 does not change the coefficients compared with Model 2, suggesting that when individual need is controlled for, provider effects are not accounting for GP practice need, but rather for supply.
The coefficients on area supply variables are affected substantially by the inclusion of past care markers in Model 2, and marginally by the inclusion of provider effects in Model 3. If past care markers are not included, coefficients on area supply variables may reflect unobserved differences in need, then sterilised when predicting need.
The inclusion of past care markers in Model 2 improves the R-squared from 0.0073 to 0.2746 at the individual level and from 0.7203 to 0.8155 at the GP practice level.

Importance of provider effects
Provider effects in Model 3 and Model 4 pick-up variation in supply across GP practices within CCGs and reduce the size and precision of the coefficients on CCG dummies. The reduction is larger from Model 1 to Model 3, where past care markers are not included, than from Model 2 to Model 4, indicating that provider effects pick-up variation in unobserved need otherwise picked-up by CCG dummies. The reduction in CCG dummies coefficients after including past care markers is smaller when provider effects are also included (from Model 3 to Model 4 rather than from Model 1 to Model 2). Moreover, the inclusion of past care markers tends to reduce the size of the coefficients on the provider effects from Model 3 to Model 4. These are indications that provider effects account for variation in unobserved need at the area level when it is not accounted for by individual controls. The inclusion of provider effects increases R-squared at the GP practice level only, from 0.7203 in Model 1 to 0.7929 in Model 3 without past care markers, and from 0.8155 in Model 2 to 0.8481 in Model 4 with past care markers.

Robustness checks
The coefficients on past care markers in Model 2 are not affected by the inclusion of provider effects in Model 4. If past care markers were correlated to unobserved differences in supply and recording across providers, these were already controlled for by the CCG fixed effects. The Oster bias corrected coefficients are very close to the coefficients estimated in Model 4, making the bounds for the true coefficient very narrow. Coefficients on the past care markers are stable even when their additional explanatory power is accounted for through R-squared.
When replicating the analysis using higher level purchaser fixed effects (STPs rather than CCGs) allowing for higher variation in provider use, results remain unchanged for individual need variables, but coefficients on area and GP practice variables are smaller. Table 3 presents examples of coefficients estimated in Models 5 to 8 using STP dummies. The full set is presented in the Online Appendix. Table 4 presents the coefficients for models 3, 4, 7 and 8 re-estimated as Models 9, 10, 11, 12 including contact with any provider and contact with a specific provider as two separate components of provider effects. Results are unchanged for individual variables, while the size of coefficients on area and GP practice need and supply variables is affected, suggesting that coefficients on need variables may have been biased by supply. R-squared is unaffected.
When estimating Model 2 and Model 4 excluding CCGs served by low reporting quality Trusts, the coefficients remained unchanged (Model 12 and Model 16 are reported in the Online Appendix). This suggests that CCG dummies and provider controls account for provider variation and coefficients are not biased by reporting differences.

Discussion
We have illustrated how the inclusion of care markers from past care contacts in risk-adjustment models can increase predictive power and disentangle the contributions of need and supply factors to variations in costs. The inclusion of provider effects can account for the potential correlation between past care markers and systematic differences in diagnostic, treatment and recording quality. We illustrated  Our results build on the development of risk-adjustment formulae for resource allocation in England. We highlight how in person-based models, as in area level models [12], the inclusion of morbidity markers improves predictions and, alongside appropriate provider controls, ensures unbiased coefficients on the need variables. As discussed in Dixon et al. [37], when individual need variables are included in the model, past encounters and diagnoses at the area level tend to reflect supply rather than need. We show that past care use measured at the area or GP practice level serves as a control for unobserved supply variation across GP practices and providers within local health organisations. They effectively perform the same function as provider fixed effects. However, because not every patient uses the service and because patients can use multiple providers, their values vary between 0 and 1 across GP practices, and they reflect variations in total use as well as variation in the extent to which each provider is used.
With individual level data, we were able to refine the approach previously proposed at the area level [24] using individual past care markers and provider effects, alongside local health organisation fixed effects. We could also split provider effects into their GP practice and mental health trust components, which could both affect access to care. The inclusion of provider effects increased explained variation and improved the precision of the coefficients, but local health organisation fixed effects and GP practice supply variables are generally sufficient to avoid bias in the coefficients on individual-level need variables.
Results from the formal assessment of the bias in past care markers coefficients are relevant more broadly for riskadjustment models based on administrative records from past care. It is well known that past care markers improves prediction, but there are still concerns about omitted variable bias [6]. Adapting the Oster [25] method to assess bias by evaluating coefficient stability according to variations in coefficient size and in R-squared, we showed that with sufficient control for supply factors the bias is minimal.
The use of diagnostic variables, and especially past care and expenditure may also provide perverse incentives and reproduce unfair utilisation patterns [6]. We showed that past care use at the area or GP practice level, if differentiated by provider can be helpful in controlling for variations in supply. In models that use standardisation techniques [39] to disentangle need and supply driven cost, the effect of area level past care use can be sterilised to control for any unobserved difference in supply. The approach is valid as long as there are enough variables capturing need at the individual level, such as in our example past mental health care markers. If those are not included, there is a risk of inappropriately sterilising legitimate differences in need.
We were able to carry out a highly comprehensive analysis including the whole adult population of England (over 40 million people), over 7500 primary care providers, 66 Mental Health Care providers and 211 purchasers. The large size of the dataset allowed the inclusion of a large number of covariates: 14 five-year age bands (up to 85 +), 16 ethnic groups, 11 household types and 7 physical health diagnoses, an additional 50 mental past care markers, along with GP practice and area level variables, 66 provider controls and 211 CCG fixed effects.
We were able to include a relatively large set of individual characteristics, compared to other studies based on administrative records. Indeed, individual ethnicity and household type were used for the first time in an analysis covering the whole population in England. We could then use area level   A model including both individual past care markers and area controls for past provider contacts ideally produces the most equitable allocations. However, its implementation may be flawed unless appropriate techniques to impute unrecorded past care markers are developed and agreed. Because the formulae only aim at generating playing level field, it is important that it is complemented by other policies targeting different levels and aspects of service delivery which would also help to improve the reliability of past care markers. In settings like England, where most patients within the same local health organisation (CCG) use the same provider, the inclusion of local health organisation dummies could sufficiently control for unobserved supply bias in the coefficients on mental past care markers. However, the additional inclusion of provider controls contributes to minimising the bias in GP practice need variables. The choice between a model without past care markers and with or without providers care contacts depends on policy makers' preferences over a trade-off. The model with provider care contacts, underestimates differences in need, but ensure that any unobserved difference in supply is controlled for. The model without provider care contacts increases the differentiation in estimated need, whilst accepting that some of it may reflect unobserved supply, but ensure that any unobserved difference in supply is controlled for.

Appendix
See Tables 5 and 6. Observations 21,319,709; robust t statistics in parentheses; *p < 0.05; **p < 0.01; ***p < 0.001; R max used in the calculation of Oster bias corrected coefficient is 0.2753.The table reports coefficients for examples of variables from the sets of person characteristics (age, gender, ethnicity and household type) and past care markers (ICD-10 chapter F mental health diagnoses and risk indicators). The Online Appendix reports coefficients for all variables