FormalPara Key Points for Decision Makers

Current processes for allocating public spending between departments are not informed by appropriate evidence about trade-offs. Greater public good could potentially be achieved by reallocating existing budgets toward activities with higher value to society.

We identify a number of ways of measuring the incremental productivity from potential specific projects/policy changes at the budgetary margin where decisions are being made, and consider the strengths and limitations of each.

We suggest an alternative approach, involving the identification of a core set of generic outcome attributes that could be used to systematically measure and compare the outcomes produced from disparate public sector activities in a commensurate manner.

The approach could be accompanied by evidence on the preferences of the general public for different types of public sector outcomes, facilitating comparisons of value for money.

Further research in this area has the potential to substantially improve the evidence available to inform the allocation of public sector budgets.

1 The Challenge of Assessing Value for Money in the Public Sector

For several decades, considerable effort has gone into mechanisms to achieve more efficient allocation of funds within specific areas of public spending. Nowhere are those efforts more obvious than in the healthcare sector and its systematic use of cost-effectiveness analysis to assess new healthcare technologies. For example, in England, the National Institute for Health and Care Excellence (NICE) assesses the cost effectiveness of health technologies using evidence on the incremental cost-effectiveness ratio (ICER), expressed in terms of the incremental cost per quality-adjusted life year (QALY) gained. The ICER of new technologies is compared against a cost-effectiveness threshold ratio that is intended to represent the opportunity cost of marginal spending in the National Health Service (NHS) [1]. In principle, any proposed new spending with an ICER that is higher than NICE’s threshold should not be introduced because it is likely to displace more health benefits than it generates [2, 3].

There are considerable challenges to implementing such assessments, not least in estimating the opportunity cost of NHS spending at the margin [1, 4,5,6,7]. Nevertheless, the concept of the QALY [8,9,10,11] is widely accepted as providing a pragmatic, operational measure of value that facilitates comparisons of cost effectiveness between healthcare interventions [12]. The measurement approach is widely used in health technology appraisal (HTA) systems around the world [13].

The QALY focuses on outcomes achieved in terms of improving survival and quality of life, combining these into a single metric that allows comparisons of effectiveness and cost effectiveness of technologies aimed at different conditions and dissimilar outcomes. This generally works well when interventions are relatively small in comparison with the overall health budget, primarily entail resource use within the healthcare sector, and where the outcomes are exclusively or primarily QALY gains.

However, occasionally there is a need to evaluate policies that have major effects across the whole healthcare budget, or where there are strong cross-sectoral elements of costs, outcomes or both. An example is the relationship between the health- and social-care sectors—the importance of which has been recognised by the development of an outcomes measure suitable for use across both sets of services that extends the range of services that can be subject to QALY evaluations [14]. More generally, it has occasionally been argued that benefits beyond QALYs—such as increased productivity of workers—should be considered in assessing health technologies, although it remains unclear how to take into consideration such wider factors without significantly departing from the cost-per-QALY framework. A much broader example of the challenges encountered in evaluation of cross-sectoral costs and effects can be found in the difficulty assessing alternative government responses to COVID-19, where there are complex trade-offs between health and wealth and effects on the socio-demographic, socio-economic and generational distributions of costs and effects.

Evaluating options where costs and benefits cross the borders of different areas of the public sector pose challenges in part because other areas of the public sector lack a single measure of output and value equivalent to the QALY, even though the measurement of effects on mortality and to a lesser extent on quality of life has also historically been a research focus in non-health sectors such as transport or environment [12]. This lack of outcome metrics in areas of public sector activity beyond health not only contributes to the challenges of evaluating multi-sectoral interventions, it is also reflected in the lack of an 'all-encompassing' measure of value to inform assessments of value for money across the public sector. This represents a fundamental weakness of the public sector strategy regarding resource allocation.

In this paper we aim to stimulate debate and research efforts to improve the evidence available to inform the political process. In particular our aims are threefold: first, to identify what we consider to be a very important gap in the evidence currently being used to allocate budgets between public sector activities in the UK; second, to provide an overview of the main types of approaches that are available and their limitations; and third, to propose a pragmatic approach that could be taken to the measurement and valuation of disparate public sector outputs in a commensurate manner.

2 A Case Study: Value for Money Assessment in the Public Sector in the UK

2.1 Are there Mechanisms in Place to Make the Outcomes Across Departments Comparable?

Within government departments it appears there is a considerable but variable degree of evaluation of major projects or proposals, some of it undertaken internally and some commissioned externally from academic researchers or consultants. These should follow the methods for economic evaluation as set out in the HM Treasury’s ‘Green Book’ [15].

The Green Book strongly advocates for the quantification of the benefits associated with a new policy. Its preference is that this quantification should be in monetary terms via Social Cost Benefit Analysis (CBA), or, where CBA is not possible, using measures of subjective wellbeing, provided that is appropriate to the context/option under appraisal (e.g., community cohesion). This is intended to ensure consistency in the methods used by different Departments. In practice, undertaking a full CBA is not always possible or practicable, and there is considerable variation in the detailed methods and effort that goes into describing, measuring and valuing the myriad of different benefits that proposals provide, reducing the direct comparability of such evaluations.

Table 1 illustrates this result. We extracted information from a selection of published Impact Assessments (IAs) in the following areas: Department for Education (DfE), Department of Health and Social Care (DHSC), Ministry of Justice (MoJ), Department for the Environment, Food and Rural Affairs (DEFRA), and Department for Transport (DfT). We observed a lack of consistency in the methods used by different departments. For example, according to the Green Book, the impacts of government proposals on health should always be considered. However, few IAs in non-health departments quantify the impact on health, and the monetisation of health impacts is not consistent across reports.

Table 1 Outcome description for selected impact assessments published by the UK government

Different areas in the Government are putting much effort into helping local/area-related authorities make evidence-based spending decisions. For instance, Public Health England (PHE) set up a Prioritisation Framework spreadsheet to help local public health teams’ decision making [16]. Other departments designed specific toolkits as guidance to local bodies on how to plan and undertake an impact evaluation (see for instance toolkits created by the Department for Business, Energy and Industrial Strategy or the DfT [17, 18]). However, toolkits generally work as a guideline and fail to provide an explicit list of output measures to be used in the evaluation: CBA is rarely suggested as the recommended form of evaluation. As a consequence, IAs created using these toolkits are not always directly comparable even within the same department.

2.2 Are There Mechanisms in Place to Make Comparisons Across Departments?

Public Spending Reviews set spending limits for each government department. Each department is then required to prepare an annual departmental business plan setting out how it will deliver agreed objectives and government manifesto commitments within its spending limit. There appears to be no formal process through which comparisons are made across departments in terms of the value of, and priorities between, specific additional spending proposals.

In a review carried out by the UK government in 2015 [29], lack of comprehensive and comparable evidence, as well as difficulties in the measurement of outcomes, were highlighted as some of the main complications of comparing value for money across departments. Recommended actions included the use of multiple outcome measures [30]. A recent review commissioned by the Treasury proposed a Public Value Assessment Process Framework (based on the work of Mark H. Moore defining ‘Public value’ [31] and ‘Public value account’ [32]). However, it largely fails to address the process by which budget allocations are made between departments, and whilst it highlights the need to engage users and citizens, it talks of the need to convince taxpayers of the value being delivered by spending rather than any process that might seek out and incorporate the values of the public to improve the allocative efficiency of the prioritisation process.

In structured discussions we undertook with those familiar with the processes, all agreed that few if any procedures, formal or informal, explicitly addressed comparisons of value for money between departments. (See in Table 1 the disparity of outcomes considered of relevance across different departments.)

3 Comparing Across Sectors: Idealism Versus Pragmatism

3.1 A Framework for Decision Making

Attempts to assess and compare value for money across disparate sectors is a challenge, and any rationale underlying a general framework is debatable [33]. The most common approach describes the aim of the decision maker to be the efficient inter-area allocation of a budget, achieved by measuring net benefits (welfare) in order to establish critical cost-benefit ratios [34]. This goal might not be completely unconstrained—for example, it might be subject to some equity or distributional constraints.

Whatever the goal of public spending is, we would argue that decisions about allocating public sector budgets are being made anyway, in the absence of evidence, and that improving the evidence base for such decisions would assist by promoting debate and explicit consideration about what the goals of public spending are.

3.2 Options for Examining Value for Money Across Departments

There is a disparate literature on this issue, which includes a range of theoretical approaches to this problem and examples of elements of these being operationalised. Most approaches seem to fit within three broad types, as characterised in Fig. 1.

Fig. 1
figure 1

Elements of resource allocation framework. M: government budget; {m1,…, mj} departmental budgets for j = {1,…, J} public sector departments; \({h}_{j}\) health outcomes from department j; \({a}_{j}\) outcomes in non-health attributes from department j; W = total welfare of society; {O1OJ} departmental outcomes. H = health; A = non-health. Source: adopted from [35]

The first approach assesses value for money by establishing the trade-off between sector-specific outcomes (see framework A in Fig. 1). The main advantage of this option is that decisions would be informed by sector-specific aggregate measures, which are in turn those measures that better capture the levels of efficiency within a department (such as QALYs for HTA [3], or Prevented Fatalities for road safety). This approach is probably the most idealistic one, and arguably something to aspire toward. However, this option would constitute a very ambitious and extensive research agenda at present, given that the development and use of QALYs as a health-specific aggregate measure has not been mirrored in other public sector departments. This approach does not therefore provide a pragmatic means of proceeding in the short to medium term.

A second approach would require the use of a single measure for depicting policy impacts (see framework B in Fig. 1). The most widely used measures are direct (monetary outcomes) or indirect (subjective wellbeing).

An example of the direct monetisation of policy impacts and costs is provided by the UK (see Box 1). This method still involves the challenging task of quantifying and valuing (e.g., willingness-to-pay) a long list of dissimilar outcomes. It is clear from the IAs that we reviewed, examples of which are provided in Table 1, that in many cases the important intangible outcomes are extremely difficult to value directly, and the danger is that the focus is transferred from the important outcomes to those that can most easily be monetised. In fact, recent research has proved that even similar outcomes (such as ‘health’ or ‘human life’) are given inconsistent monetary valuation in different government departments [35]. Our review of IAs demonstrates that in practice it is too difficult to put monetary values directly on many of the key outcomes. This suggests that the current approach, despite being consistent with (and indeed advocated by) the Green Book, is in practice problematic and as a result provides inadequate evidence to ensure that decisions about public sector spending are efficient.

figure a

A second method is to measure the impact of every policy in terms of subjective wellbeing (SWB) [36]. There has been considerable interest in the use of SWB approaches by previous governments as an alternative or complement to Gross Domestic Product (GDP) [37], and there are strong academic advocates for the use of SWB to evaluate policy [36]. The UK’s Office of National Statistics now includes four ‘standardised’ questions on SWB into national surveys. Measuring outcomes in terms of wellbeing would in principle facilitate direct comparison of outcomes between departments. However, to date, SWB has rarely been used as the principal output measure directly linked to specific decision making. Whilst there are strong theoretical foundations for SWB, there are unresolved issues with the measurement properties of the questions used to measure it [38]. The main drawback of this approach is the lack of consensus about whether any measure of wellbeing could be sufficiently robust, reliable and comprehensive, and yet sensitive to relatively small changes in overall wellbeing, to capture the value of diverse policies across the public sector. Whilst this method may be useful as a tool to assess ex post the effect of substantial policy programmes, we are not aware of any examples of it being successfully used to evaluate ex ante the perceived value of specific policies and programmes. An example of the existing use of SWB can be found in Bhutan (see Box 2).

figure b

A third approach is the categorisation of the various outcomes in each case into principal ‘outcome types’ (see framework C in Fig. 1). This method usually relies on pre-determined schedules of monetary values (and thus this approach could be seen as the direct monetisation of policy impacts). Examples of this mechanism can be found in the development and use of ‘social value banks’ in Australia and New Zealand (see Box 3).Footnote 1

figure c

Measuring outcomes in a common unit of wellbeing or money would provide the relative value of all of the outcomes. Theoretically, the three methods reviewed in this section would ultimately result in similar ‘exchange rates’ between public sector outcomes in terms of wellbeing, implying a similar equilibrium for the optimisation process. In practice, limitations of the methods for direct or indirect monetisation and for measuring SWB, and their application, are likely to lead to different conclusions.

4 A Possible Way Forward: A Generic Descriptive System for Public Sector Outcomes in the UK

Given the drawbacks of both direct monetisation and measures of SWB, the indirect monetisation of outcomes (see framework C in Fig. 1) is arguably a more pragmatic approach. However, the specification of the principal outcomes is a complex task, and if these attributes are defined de novo for each individual project, then comparability may be undermined by a lack of transparency and consistency between projects.

We propose an alternative approach for assessing public sector outcomes in the UK, in which (a) the outcomes of each policy would be systematically measured using a standardised, pre-defined set of ‘outcome types’ or attributes to describe the changes resulting from each proposed public policy/good or service; and (b) that facilitates the use of stated preference studies, such as Discrete Choice Experiments (DCEs) [45] to specify the relative weight attached to the achievement of disparate outcomes. DCEs provide a means of identifying the extent to which respondents are willing to trade off an improvement in one attribute (outcome type) against a worsening in another, as a means of identifying the relative value attached to the achievement of different types of outcomes [46,47,48]. There is a growing literature on the use of DCEs to generate evidence for public policy [49], but it tends to be restricted to the one-off evaluation of specific policies. The approach we propose aims to bridge the gap between the direct and indirect monetisation methods, and it may get us closer to the ideal scenario of a political context informed by people’s preferences over public wellbeing aspects.

The literature provides some indicators of UK societal preferences. For instance, in the UK, health (55%) and education (22%) were chosen as the highest priority for government spending in 2016 by respondents to the British Social Attitudes Survey. However, as highlighted in a report by PHE [16], new education policies are likely to involve health-related outcomes (e.g., awareness of bad health habits), and new health technologies may also have an impact on education (e.g., pain relief treatments impacting on higher achievements). Therefore, considering attributes that are meaningful across sectors will be of key importance.

The feasibility of providing a structured way of measuring public sector outcomes is reinforced by the existence of other outcomes frameworks—for example, the Australian National Development Index,Footnote 2 or the Canadian Index of WellbeingFootnote 3—aimed at providing a complete picture of national wellbeing. Such initiatives, which are part of the wider movement to go beyond GDP [50], could provide highly relevant inputs to the development of a measure of public sector outcomes. Ultimately, however, a public sector outcomes measure should reflect the attributes of key importance to the taxpayers in the country whose decisions its use may inform.

Data extraction from published IAs provided a list of outcomes associated with the UK policy options under consideration. If it were possible to categorise all the outcomes in a set of common attributes, these could then be used systematically to describe, measure, and value, in a commensurate way, the principal outcomes of any governmental activity across departments, whether that activity may be the regulation, funding or provision of goods and services.

Table 2 shows the type of attributes found in existing IAs that might provide a starting point for developing a common set of attributes, and examples of policies, in and outside of the NHS, whose outcomes can (in part) be captured by that attribute.

Table 2 An illustration of a generic descriptive system for public sector outcomes

Research would be required to develop our suggested approach. First, a systematic review is required to yield a complete list of public services at the margin across different areas of the public sector, characterised in terms of their costs, outputs and outcomes, expressed in as tangible and meaningful a sense as possible. Second, the key attributes of public sector outcomes would need to be extracted from the outcome list. These would form the basis for a generic descriptive system for public sector outcomes, where each attribute is able to be measured in quantitative terms—either on a cardinal scale (e.g., mortality rates) or categorically (high, medium, low) in order to clearly describe the extent of the achievement of that outcome, and changes to outcomes that result from policy options under consideration.

These outcome categories could then form the basis for a DCE design, which will seek to elicit the stated preferences of a large sample of the general public with respect to sets of these outcomes. An illustrative example is provided in Fig. 2.

Fig. 2
figure 2

Illustration of a pairwise choice task to elicit stated preferences for cross-departmental outcome attributes through a discrete choice experiment (DCE). Question: Which is better, Policy A or Policy B? The consequences of implementing the policies are shown below

There is an obvious analogy between the proposed approach to measuring and valuing public sector outcomes, and approaches widely used to measure and value attributes of health-related quality of life, such as the EQ-5D, in the healthcare sector [51]. The use of stated preference methods to obtain the values of the general public for generic measure of outcomes has, in that context, been shown to be feasible and to provide a robust and acceptable evidence base for public decisions.

The DCE would identify the relative values of the public (or subgroups of beneficiaries of governmental activities) for different types of outcomes and the trade-offs they are willing to make. This could help to inform judgements about value for money across the public sector and this in turn would enable establishment of the marginal value placed on outcomes from different areas of government spending.

We recognise of course that assessing budget proposals only based on their expected impact on wellbeing, with the impact measurement reflecting the preferences of the taxpayers, is an ambitious project. Government budget allocation decisions will correctly remain a political decision, which will reflect a variety of considerations and judgements that will go beyond even the broad set of outcomes that would form the basis of our approach; even when developed fully it would be a decision aid to inform and illuminate a complex process.

Beyond budget allocations, the generic descriptive system for public sector outputs suggested in this paper could be a valuable tool to inform decision making in any setting that involves the comparison of value for money of interventions whose outcomes have an impact in multiple sectors. A good example of this context is provided by Walker et al. [52], who propose an extension on the standard ‘impact inventory’ in health and medicine [53] with the aim of capturing the effects of the intervention in sectors beyond health. A set of attributes representing what is most relevant for the society, such as proposed in this paper, could provide the key dimensions to comprise the impact inventory. In addition, the use of DCE to elicit the relative values of the public for the different attributes could produce a composite measure of value for money across departments, resolving the problems of aggregation related to the impact inventory expansion [54].

5 Conclusion

The current absence of a formal method to inform the allocation of budgets between departments through public valuation is an important gap in the budgetary allocation methods in the UK. Such a mechanism would be particularly valuable where policies have multiple outcomes that go beyond the main focus of the department in question. In addition, there is considerable interest internationally in methods for capturing benefits other than health gain in the evaluation of new healthcare technologies (e.g., see special ISPOR task force report on US Value Assessment Frameworks [54]).

Whilst there are clearly methods employed to attempt to ensure that, within areas of spending such as health, opportunity costs are identified and considered, there seems little formal consideration in the UK of the opportunity cost at the margin across departments. As no doubt is also the case in many other countries, departmental budgets depend largely on the previous year’s budget allocation, adjusted according to broad politically determined priorities rather than on explicit mechanisms that are aimed to achieve allocative efficiency of public spending. However difficult, there is a real need for methods that can provide well-founded comparisons of value for money across departments.

This paper has identified a number of possible approaches being tried in different countries. We suggest a way forward for further research to develop a sound yet pragmatic approach for identifying and describing the outcomes from disparate public sector activities in a broad and consistent manner that facilitates making comparisons of public valuations of projects.

In the absence of systematic evidence of this kind, it is unclear whether UK taxpayers’ resources are being used in a manner that is allocatively efficient. It is potentially the case that the welfare of UK people could be substantially increased without an increase in spending, simply by reallocating budgets toward those activities and policies that produce more of the outcomes that the general public value. The lack of such evidence, or any mechanism to achieve that end, seems like an important wasted opportunity.