Levelling the playing field: The effect of including widening participation in university league tables

Nationally and internationally, universities are ranked in university league tables (ULTs). Sustained academic criticism of the rationale and methodology of compiling ULTs has not stopped these rankings exerting considerable pressure on the decisions of university managers. The compilation of ULTs is an inherently political act, with the choice and weighting of metrics resulting in particular characteristics of individual institutions being rewarded or penalised. One aspect that is currently not considered by league tables is the diversity of the student intake, and the extent to which an institution has been successful in widening participation (WP) in higher education (HE). The need to take action is reflected in target 4.3 of the fourth United Nations Sustainable Development Goal (SDG 4), which aims to “ensure equal access for all women and men to affordable and quality technical, vocational and tertiary education, including university” by 2030. This article explores how current ULT metrics for universities in the United Kingdom (UK) relate to WP. Using publicly available data, the authors found that over 75% of UK league table metrics are negatively related to WP. This has the effect of making institutions with a diverse student body significantly more likely to be lower down in the league tables. The worst relationship with WP is for entry standards. Universities which recruit high-performing students are actively rewarded in the league tables; this fails to recognise that students with high entry grades are more likely to come from privileged backgrounds. The authors developed a ULT which includes a WP score as an explicit league table metric and found that their WP-adjusted table removed the negative relationship between WP and league table rank, resulting in a somewhat fairer comparison between universities. They conclude that ULT compilers have an ethical duty to improve their definition of a “good” university, which in the current HE environment of the UK must include WP. The authors believe this should be an urgent priority for the sector, so that universities with a commitment to widening participation can be recognised and rewarded.


Introduction
University League Tables (ULTs) are an inescapable feature of contemporary higher education. In an attempt to identify the "best" universities, several organisations compile league tables of higher education institutions (HEIs) at either an international or national level. The publication of the latest league tables typically results in significant press coverage about the rise or fall in prestige of individual institutions, and many universities use their league table position as a central tool in their recruitment and marketing to prospective staff and students. Most institutional 1 3 Levelling the playing field: The effect of including widening… senior management teams name improving their league table position as a core goal (Hazelkorn 2009). League table position can also have impacts on individual students; for example, the granting of visas to international students can be contingent upon their university's position in the Academic Ranking of World Universities (ARWU) (Hazelkorn 2007;Hosier and Hoolash 2019).
League table curation is associated with several different -potentially conflicting -purposes. Some journalistic outlets have framed university rankings in terms of student choice, arguing that since students generally attend a single institution for their undergraduate studies, information about institutional prestige should be available to potential students when they consider their options (Bowden 2000). Some rankings seem more aligned with politics, framed in terms of international policy goals such as advancing a "knowledge economy" (BIS 2016;Dill and Soo 2005), or national policy goals such as social justice (HEPI 2018). Accordingly, rankings can have political consequences. Although originally developed to provide students and parents with information to inform their course choices, adoption of national and international league tables has resulted in a more competitive higher education (HE) market (Hazelkorn 2009). By consequence, the use of league tables is also shaping institutional strategy, and placing increasing pressure on academic and professional services staff to target their activities towards things that will improve institutional league table performance (Hazelkorn 2007(Hazelkorn , 2009Locke et al. 2008).
Although ULTs have made a major impact on universities, there is broad agreement that they are flawed instruments in terms of their usefulness in measuring the diversity of activities taking place within universities (Hosier and Hoolash 2019;Locke et al. 2008;Lynch 2015;Tofallis 2012). Ever since their conception, league tables have been roundly critiqued for a broad range of reasons. Some of these concern technical criticism of the statistical methodologies used to generate the ranking hierarchy. For example, Chris Tofallis offers a close examination of the additive methodology used to compile scores from different indicators, and develops a statistical critique of certain normalisation algorithms (Tofallis 2012). However, most critics focus on the link between the raw data used and the conclusions drawn from their analysis (Bowden 2000;Dill and Soo 2005;Locke et al. 2008).
Reducing HEIs to a single numerical rank also has the effect of devaluing the broader impacts that universities have in society, which are much less easily measured. For example, in the European tradition, a core purpose of education has always been its role in preparing people for civic life (Plato 2003;Dewey 1923). Universities are often described as having a role in pursuing social justice, which Mala Singh defines as a search for a fair (not necessarily equal) distribution of what is beneficial and valued as well as what is burdensome in a society (Singh 2011, p. 482).
The reproductive mechanism of education makes it an attractive policy space to address wider social injustice (ibid.). It is argued that by engaging in HE, students from disadvantaged backgrounds become more able to access societal resources, be it in purely economic terms or through accumulating social capital. 1 Sociological analysis has described education as a site of social reproduction, in which learners adopt the dispositions required to succeed in their societies (Nash 1990). Universities can also redistribute resources through engaging with society on key issues, working towards solutions though research and knowledge exchange activity. Social justice will mean very different things in different national contexts. For example, the dominant concern might be socio-economic disadvantage in one country, but ethnic disparities in another. It has been disputed whether universities genuinely redistribute resources or not (Case 2017;Singh 2011), but the ideals of social justice are widespread within the HE sector. This broader civic role of universities is almost impossible to capture numerically, so is not well represented in league tables that depend on quantitative methodology.
The central contention of ULT criticism is that there is no objective construct of the "goodness" of all universities: on what scale are diverse institutions being ranked? Whatever construct universities are being ranked on is created by the ULT methodology rather than describing some objective truth about a varied sector (Hosier and Hoolash 2019;Locke et al. 2008). As such, league table compilation is an inherently political act, yet rankings are presented as an objective truth about the sector (Lynch 2015). Even amidst the sustained academic critique, there is some recognition that the "scorecard" approach 2 to collating pertinent data might have some value if properly executed (Tofallis 2012), although this comes with the recognition that relying on ULTs as a basis for decision-making can promote contradictory values and goals (Hosier and Hoolash 2019). The selection of indicators is particularly important in the ULT process. There is a strong suspicion in the literature that ULT compilers are "led by the nose" and use readily available data rather than appropriate data (Dill and Soo 2005). This suspicion is deepened by the general lack of any stated rationale for selecting or weighting the indicators, either in terms of ideology or statistical validity.

Use and impact of international and national league tables
There are many ULT systems in use, some of which rank institutions within a given country, while others attempt to compare universities at a global level. Internationally, the most commonly used league tables are the World University Rankings of Quacquarelli Symonds (QS), the Academic Ranking of World Universities (ARWU) and Times Higher Education (THE) World University Rankings. International

3
Levelling the playing field: The effect of including widening… rankings of HEIs are compiled in various ways, but always prominently feature indicators relating to research productivity, such as the number of Nobel Prize winners (e.g. ARWU) among current or past staff and students or the number of articles published in or cited from international databases (e.g. AWRU, THE) (Buela-Casal et al. 2007). Such indicators bias rankings in favour of HEIs with publications in English databases and scientific fields (Ordorika and Lloyd 2015). Some ranking systems also incorporate a "peer review" component to rankings (e.g. THE), where researchactive academics are asked to identify the top universities in their subject areas (Buela-Casal et al. 2007). This has opened international ULTs up to criticism of cultural imperialism, since these biases make the rankings into a "Harvardometer" by embedding an Anglo-Saxon view of "the University" as an elite research establishment into the exercise (Ordorika and Lloyd 2015;Somers et al. 2018).
This homogenising assessment influences the behaviours of HEIs and policy actors at both institutional and national levels. Imanol Ordorika and Marion Lloyd document varied responses to the hegemony of standardised metrics (Ordorika and Lloyd 2015). 3 In Brazil, QS's apparent endorsement of the performance of privately funded HEIs (QS 2012) failed to account for both the poor quality of many private HEIs and the social mission of universities in many Latin American countries (Kinser and Levy 2005), provoking sustained resistance from national bodies. In France, apparently poor ULT performance has motivated the merging of HEIs into "supercampuses" which can compete with large institutions in the United States (Labi 2010). Several countries consider the ULT ranking of students' chosen institutions when deciding whether to grant a study visa (Hazelkorn 2007;Luxbacher 2013;Hosier and Hoolash 2019).
International rankings are apparently both powerful and imperfect, even when relying predominantly on seemingly comparable research outputs (Bowden 2000;Buela-Casal et al. 2007). However, universities also have non-research functions such as fostering civic citizenship or technical education, and these functions will legitimately vary between nations (Luke and Hogan 2006). To reflect this, regional and national ranking systems have also been developed, which may or may not correlate with the international league table positions. These national league tables vary considerably in their component metrics. For example, Japanese universities are primarily ranked in terms of hensachi, a selectivity score which indicates the relative ranking of students based on mock test results (Yonezawa 2010), while the QS University Rankings for Latin America incorporate metrics such as employer reputation and the extent of relationships between the institution and other universities outside of the region (QS 2019). We believe this national-level approach is particularly appropriate for attempts to evaluate teaching, because countries' national secondary education systems and economic policy objectives differ, as do the available data sources (which in general will be difficult to compare) (Dill and Soo 2005). For example, entry tariff 4 might mean a different thing in a country with a small, elite public HE system compared to states pursuing a mass participation model. However, quality of teaching will, in many contexts, be a vital part of what makes a good university. We therefore chose to focus our attention on the HE sector in the United Kingdom (UK) to consider the rankings of universities within a specific context, and how these rankings reflect what is considered a "good" university.

The United Kingdom league table context
The UK provides a useful case study for the compilation of league tables, since there are large amounts of publicly available data on each HEI. The first ULT in the UK was published in the Times newspaper in October 1992; meanwhile there are three independent league tables published each year (Turnbull 2018). These are produced by (1) the Times and Sunday Times newspapers, published as the Good University Guide (the most recent one is O'Leary 2020); (2) the Guardian newspaper (most recent is Guardian 2020); and (3) the Complete University Guide (most recent is CUG 2020).
All three have the stated aim of informing prospective students' choice of university. All use different methodologies, despite the fact that these methodologies have been criticised for being non-transparent and non-reproducible, particularly in terms of the way data are normalised 5 and presented as "scores" (Tofallis 2012). The methodology used for each table has changed over time; for example, the Times league table originally used 14 equally weighted indicators of quality, and now uses eight metrics with unequal weightings (Bowden 2000). Only the Guardian and Complete University Guide tables have publicly available results and methodologies; the indicators and weightings included in these ULTs are listed in Table 1. Use of different metrics results in different lists; for example, in 2019, only 6 institutions were found among the top 10 in all three league tables. This demonstrates that there is no agreement on the "right" way to compile a league table even within the UK.
UK ULTs, like their international counterparts, have been heavily criticised, both as a whole (e.g. Bowden 2000) and with a focus on individual metrics. One UKspecific criticism is the use of student satisfaction scores via the National Student Survey (NSS) 6 (Grove 2017). NSS ratings are argued to be poor proxy measures of teaching quality (Botas and Brown 2013; Langan and Harris 2019) and subject to biases (Bell and Brooks 2019). Since the NSS is completed before students graduate, NSS ratings also fail to capture the ongoing benefits of completing a degree, which may be highly relevant to a social justice narrative, but can take many years for students to appreciate (Leach 2019). Despite these criticisms, the NSS is a central feature of all three league table calculations, and can constitute up to one quarter of a university's ranking (Table 1).

Widening participation is a requirement for UK universities, but is not currently captured in ULTs
One striking example of the political choices made in league table compilation is the absence of any indicator that reflects the diversity of a university's intake. As such, the current UK league tables do not capture a social justice agenda. It has long been noted that participation in HE is not equally distributed across the UK population; the Robbins Report of 1963 highlighted that young people whose parents versities and colleges to improve the student experience; [and] support public accountability. Every university in the UK takes part in the NSS, as do many colleges" (OfS 2020).
Footnote 6 (continued) were manual workers were much less likely to attend university than those with professional parents (Robbins 1963). This focus on disparities on student intakes was extended 34 years later by the Dearing Report, which considered widening participation in terms of socio-economic status, disability, mature student status and ethnicity (Dearing 1997). Improving access to higher education benefits individuals and communities, and society in general, but widening participation (WP) efforts have had mixed success, and disparities still persist (Vignoles and Murray 2016). For example, in some areas of the UK, fewer than 10% of young people attend university, whereas in other areas there is 100% engagement (HEFCE 2017). Some universities have made considerable efforts to engage non-traditional groups in HE, whereas others disproportionately recruit from populations of students who are already highly likely to enter HE anyway (HEPI 2018). This means that students from disadvantaged backgrounds are less likely to attend "elite" institutions, and that those who do gain entry are less likely to have peers from similar backgrounds.
There is evidence to suggest that a diverse student body has a positive impact on the student experience, and that interactions with a diverse cohort of students can benefit individual students considerably, not least in terms of their learning outcome (Gurin et al. 2002;Shaw 2009). Students themselves recognise that educational disadvantage has an impact on the chances of going to university, and generally support the principle of contextual admissions decision-making (Dale-Harris 2019). 7 In the UK, considerable attention is being paid to university admissions in the popular press, and governments and HEIs have been severely criticised for failing to make university access more equitable (Heselwood 2018;Montacute and Cullinane 2018). At the same time, currently well-represented groups defend the status quo in the press, with some criticising actions aimed at improving equity of access (Weale 2020). Reflecting the political importance of widening access in the UK, policies and action plans to address WP are already regulatory requirements for all universities receiving public funds (although the issue is somewhat complicated by the devolution of HE policy in the UK). 8 The Higher Education Act 2004 required universities wishing to charge higher tuition fees to produce a plan to illustrate how students from disadvantaged backgrounds were being encouraged to participate in HE (Gov UK 2004). This was overseen by the new Office For Fair Access (OFFA), which has since been replaced by the Office for Students (OfS), established through the Higher Education and Research Act 2017 (Gov UK 2017). Extending the commitment to 7 The principle of contextual admissions takes into account information beyond applicants' grades, such as residence in an area with a low number of HE participants and/or having recognised refugee status etc. The purpose is to factor in personal circumstances which have caused significant disruptions to an applicant's education and therefore had a negative impact on their studies. Applicants provide this personal information in their university application documents, either through a personal statement or other demographic information such as their postcode. 8 Up to 1992, HE policy was made centrally for all parts of the UK, including Scotland, Wales and Northern Ireland. The devolution of HE policy in the UK was introduced through the Further and Higher Education Act (Gov UK 1992), which created distinct funding and regulatory arrangements for the separate nations.

3
Levelling the playing field: The effect of including widening… WP through the registration conditions for HEIs, the OfS now requires all UK universities to develop an access and participation plan which is updated on an annual basis (OfS 2018). As such, WP is therefore now a regulatorily required activity for the entire UK HE sector, and we therefore believe it is fair to judge universities on whether they have succeeded in widening participation.
While all publicly funded UK institutions are required to consider WP in their registration with OfS, some institutions explicitly make a commitment to diversifying the student body in their formal mission statements. In the UK, this is formalised through the existence of three university "mission groups" (the Russell Group, the University Alliance and the MillionPlus Group), which account for roughly 50% of UK HEIs. Twenty-four universities are members of the Russell Group, which prioritises "maintaining the very best research, [and] an outstanding teaching and learning experience" (Russell Group n.d.), but does not make a commitment to WP. Eighteen institutions are members of the University Alliance (UA), which explicitly makes a commitment to WP in its mission statement, aiming to open up equality of opportunity to all, regardless of background, ability and experience (UA n.d.).
Finally, the MillionPlus group represents 17 "modern universities" (i.e. those given university status in the Further and Higher Education Act 1992, Gov UK 1992). The MillionPlus group also actively incorporates WP in its mission statement, stating an aim of supporting anyone who has the ambition, talent and desire to succeed in higher education, whatever their background and wherever they live in the UK (Mil-lionPlus n.d.).
The University Alliance and MillionPlus groups thus make specific commitments to social justice through their core activities. The subjective choice of league table indicators and weightings may therefore favour particular mission groups, meaning that universities whose missions do not align well with ULT metrics may be unfairly penalised through being ranked on the same basis (Hazelkorn 2009 (Locke et al. 2008, p. 57).
Within the UK league tables, there is only one example of a league table metric which attempts to reflect the fact that universities do not have equivalent intakes, namely the Guardian's "Value Added" metric.
Based upon a sophisticated indexing methodology that tracks students from enrolment to graduation, qualifications upon entry are compared with the award that a student receives at the end of their studies … an institution that takes in lots of students with low-entry qualifications -who are less likely to achieve firsts or 2:1s -will score highly in the value-added measure if the number of students doing so exceeds expectations (Hiely-Rayner 2019). 9 This metric attempts to recognise that some institutions admit students with lower tariff scores, so should be rewarded for those students being awarded high degree classifications. There is therefore an expectation that some institutions can expect to see lower average tariffs -but higher value-added scores (ibid.).
While it would seem fair to reward institutions admitting students from diverse backgrounds. this is not a direct indicator of WP, but instead a measure of student attainment normalised for the student's Entry Scores. As such, explicit consideration of WP is currently absent from UK ULTs. We therefore think there is an urgent need to consider the relationship between WP and league table rankings, and to identify and include practical methods in the compilation of ULTs for rewarding institutions which successfully improve social mobility.

Aims of the current study
In this article, we explore the relationship between WP and methods of league table curation. Our specific research questions are as follows: ( In posing these questions, we hope to contribute constructively to developing league tables which communicate a more accurate description of what it means to be a good university.

Measures of equality and participation
If WP is going to be incorporated into ULTs, a numerical measure of student diversity is required. The most direct measure of participation in HE for the UK is the Participation of Local Areas (POLAR) system, re-calculated every few years by OfS. Each local area (identified by postcode) is given a score of 1-5 depending on the percentage of young people from that ward (administrative section) or area who enter HE. The areas are then grouped into five groups according to their score, with each group or quintile representing 20% of the UK. The lowest quintile (Quintile 1, comprising "low participation neighbourhoods") represents the UK's 20% of young people least likely to attend university, while the top quintile (Quintile 5) represents those young people most likely to enter HE. For each HEI, the percentage of students admitted from each of these quintiles is publicly available from data provided by the Universities and Colleges Admissions Service (UCAS) on its website (UCAS 2019). However, comparisons of HEIs across 5 different numbers is non-intuitive (i.e. not immediately plausible or helpful) and challenging. One option, namely reducing these data to measure only the proportion of students in the lower POLAR Quintiles 1 and/or 2, simplifies this analysis, but at the cost of removing potentially informative data from the calculations.
Gini coefficients offer a solution here (Martin 2018). These coefficients are a single measure of inequality, often used in economics to assess income inequality within a nation or region. Developed by Corrado Gini in 1912(Gini 1955[1912), the coefficient ranges from 0 to 1, with 0 representing perfect equality and 1 representing perfect inequality. In the context of widening HE participation and the POLAR system, a Gini coefficient of 0 represents a HEI drawing its student body equally from each of the five quintiles, and is therefore representative of the national student body. A score of 1 would indicate that all students within a HEI come from the same POLAR quintile (likely the top Quintile 5, the highest participating group by definition). Therefore, the lower the Gini coefficient of a given HEI, the greater equality of background in its student body.
For this study, we used the POLAR-based Gini coefficients graphically presented by Iain Martin (2018) in a policy note published by the Higher Education Policy Institute (HEPI). We chose this measure of WP success for a variety of reasons, primarily the directness in which it measures participation in HE. While other metrics (e.g. the Multiple Equality Measure [MEM] developed by UCAS) 10 include factors that show differences in participation (e.g. gender or ethnicity), Gini coefficients provide a succinct and direct measure of exactly the behaviour we wish to promote, namely participation. Further, POLAR Gini coefficients are built upon an established methodology for measuring inequality, and data are readily available. Gini coefficients therefore represent a convenient existing tool that league table compilers could easily incorporate into ULTs without the need to develop and verify new methodologies. Finally, Gini coefficients provide a broad, holistic measure of participation equality that is represented in a single value, simplifying inclusion into ULTs.

Data sources
Numerical Gini coefficients presented in the HEPI policy note were kindly provided by Iain Martin (Martin 2018). HEIs in this dataset were assigned a mission group as either Russell Group (RG), Universities Alliance (UA), The MillionPlus Group (M+), the former 1994 Group (1994), 11 specialist providers (e.g. Arts colleges) or no overall mission group (None). Since the 1994 group is no longer active, we reassigned its former member institutions (WhatUni n.d.) as "None".
To compare Gini coefficients with other institutional metrics, we built a dataset that included league table information for both the Guardian and Complete University Guide ULTs. Since the HEPI Gini data were published in April 2018, we based the analysis presented below on information that was publicly available in 2018 to allow for meaningful comparison. We obtained whole-institution league table data from the Guardian University League Table  Our dataset included HEIs for which Gini, Guardian ULT and Complete University Guide data were available. Institutions for which one or more indicators were not available were removed from the dataset. To compile the data, we used the UK Provider Reference Number (UKPRN) for each institution to provide a unique lookup value for each university. 12 Where datasets did not include a UKPRN (e.g. the Complete University Guide), we manually assigned the institutional name used in the league table to the relevant UKPRN. We then double-checked this for 10% of the institutions in the dataset and found no errors in assignment. Using these criteria, only one specialist college remained in the dataset (University of the Arts, London), so we also removed this HEI. This resulted in a final dataset of 118 institutions.
We recognise that our inclusion criteria somewhat restrict the comparability of our ULTs with published ULTs. However, we consider our dataset sufficiently complete to provide a useful illustration of the relationship between WP and institutional rankings in ULTs.

Statistical analysis
We compared the individual components of both Guardian and Complete University Guide ULTs to assess correlations between Gini coefficients and the metrics included. Most metrics in the ULTs are positively scored (i.e. a higher score indicates more favourable performance). However, Student:Staff ratio (SSR) is negatively scored, with low SSRs indicating smaller class sizes, which are positively rewarded in the ULT calculation. To make the correlation direction consistent for improving ULT performance, we multiplied the SSR metrics within each ULT by -1.

3
Levelling the playing field: The effect of including widening… We calculated correlation coefficients and p-values from the programming language R (R Core Team 2018) and the software RStudio (RStudio Inc. 2016), using the package Hmisc (Harrell 2019). To reduce the probability of false positive results (i.e. indicating a relationship that does not exist), p-values were Bonferroni-corrected. 13 To allow easy comparison between the two ULTs, which contain a different number of metrics (as shown in Table 1), we maintained a constant significance level α = 0.05, and instead multiplied the p-values by the number of metrics within each ULT. For example, the Guardian ULT contains 9 metrics, therefore we multiplied p-values by a factor of 9 (adjusted p-values that exceeded the maximum of 1 were assigned value 1).
Further, we considered the direction of correlation for all ULT metrics that showed a significant correlation to the Gini coefficient, and mapped this to the weighting each metric received within the ULT. Since a low Gini coefficient indicates an institution that recruits successfully from underrepresented local areas/ postcodes, a positive correlation with the Gini coefficient indicates a negative relationship between the metric and WP. Similarly, a negative correlation with the Gini coefficient indicates a positive relationship between the metric and WP. This enabled us to determine how much of each ULT runs counter to WP agendas within the UK HE sector.

Relationships between WP and existing ULT indicators
ULTs do not currently include any direct measure of widening participation, despite WP being a vital part of the UK HE landscape. However, this does not mean that ULTs do not indirectly reward, or punish, a WP agenda. We therefore first established whether the metrics included in the Guardian and Complete University Guide league tables correlated with the diversity of the student population, as measured by Gini coefficients.
We found significant correlations between the Gini coefficient of an HEI and the institution's performance on the majority of factors considered within both the Guardian and Complete University Guide ULTs (Figure 1). If ULTs do incentivise WP, we would expect a significant negative correlation between league table rank and Gini coefficient, since lower Gini coefficients indicate better recruitment from underrepresented groups. However, for the Guardian ULT, there was a significant positive correlation between league table ranking and Gini coefficient (Pearson correlation coefficient 0.49, p <0.01), which means that there was a significant negative correlation between WP and league table position ( Figure 1A). This correlation was similar for all university mission groups ( Figure 1B). The positive relationship between league table performance and Gini coefficient was 13 A Bonferroni correction is a statistical method used to avoid false positive results when multiple tests are being run on the same dataset. even stronger for the Complete University Guide table (Correlation = 0.64, p <0.01; Figure 1C), and was true for all mission groups other than the University Alliance. For both league tables, there was therefore a significant trend for those institutions which recruited a significant proportion of their students from underrepresented local areas/postcodes to be placed lower down the league table ranking.
Having established a correlation between overall league table position and Gini coefficient, we then wanted to determine which of the individual league table metrics were correlated with Gini. Within the Guardian ULT, we found that seven of the nine factors considered in this league table correlated with Gini coefficient; these factors accounted for 85% of the weighted total score (Table 2). Of these, six (75% weighting) had a correlation coefficient greater than zero, indicating that increased ULT performance for that metric correlates with decreased diversity (an increase in Gini coefficient). Only one factor, National Student Survey (NSS), scored for Assessment and Feedback, worth 10% of the overall score, and correlated with the Gini coefficient in a direction that aligns WP and ULT performance.
In the Complete University Guide ULT, eight of the ten factors significantly correlated with Gini coefficient. These eight factors accounted for 78% of the  (Table 3), and all eight had a negative relationship with WP. There were therefore no metrics in the Complete University Guide that had a positive relationship with WP.
In both ULTs considered, the strongest correlation between Gini and ULT factors was for Entry Standards (Entry Scores). Increased Entry Score (Guardian) and standards (CUG) mapped closely to Gini coefficient, with those HEIs accepting students with high tariff scores having higher Gini coefficients, and therefore lower WP success.

Calculation of a WP-adjusted league table
Having established that the existing league table metrics indirectly penalise a WP agenda, we decided to model the effects of introducing WP as a league table metric in its own right. We used the Guardian ULT as the basis for our modelling approach, as this league table is the most "student-facing" (i.e. addressing students' needs, interests and perspectives, and reducing the dependency on research-based metrics). The Guardian newspaper itself also places a high value on social justice Table 2 Correlations between Gini coefficient and the factors making up the Guardian ULT Notes: Bonferroni-adjusted p-values less than 0.05 show a significant correlation. Relationships with WP are indicated; as Gini coefficients are reverse-scored, a positive correlation between Gini coefficient and the league table metric indicates a negative relationship with WP. Table   Indicator Weighting % Correlation with Gini To model the effect of including WP in league table calculations, we adopted a relatively straightforward approach based on inclusion of Gini coefficients as a WP metric. While we do not claim that this is the most statistically appropriate or robust way in which Gini could contribute to league table calculations, we present it as an illustrative example of the potential impact of including a WP metric. We performed our calculations as follows:

Guardian University League
Step 1: Normalising the Gini coefficient to obtain a positively scoring "WP points" variable Gini values are negatively scored (low Gini coefficients indicate more socially diverse institutions), and do not clearly relate to an observable characteristic. We Levelling the playing field: The effect of including widening… therefore did not feel that direct inclusion of Gini coefficients would be helpful to the compilation of a league table, since it would be difficult for prospective students to interpret a raw Gini coefficient. We therefore calculated "WP Points", whereby institutions with a perfect Gini coefficient would be awarded 100 points, and the worst (highest) Gini coefficient in the HE sector would be awarded 0 points (Equation 1). As such, our method is based on the "Dividing by the largest value" approach to normalisation as described by Tofallis (2012).
Step 2: Recalculating the league table to incorporate WP points We then included the WP points score in the metrics, and calculated a WP-adjusted Guardian score using Equation 2, where X is the weighting factor for Gini.
We set X at 0.1325, giving Gini the same weighting as Entry Scores, Career Prospects, Student:Staff Ratio (SSR) and Value Added in the revised calculations. We then calculated league table position on the basis of the WP-adjusted Guardian score. 14 Viewing the UK HE sector as a whole, recalculation of the league table to include WP meant that there was no significant correlation between the Gini coefficient and league table position ( Figure 2C; Correlation = 0.11, p = 0.22), thereby removing the bias towards institutions with poor WP (compare Figures 2A and 2C). In the WP-adjusted league table, there was still a positive correlation between the Gini coefficient and league table position for the Russell Group, but for other institutions there was a slight negative correlation ( Figure 2D).
The inclusion of WP as a league table metric also reduced the effect of university mission group on league table position (Figure 4). In the non-adjusted league table there was a significant impact of mission group on ULT ranking (Kruskal-Wallis H = 40.38, d.f. = 3, p < 0.001). In the non-adjusted league table, there were significant differences between the rankings of Russell Group and University Alliance members (Bonferroni-corrected Wilcox post-hoc test, p = 0.002), Russell Group and Million-Plus members (p < 0.001) and between University Alliance and MillionPlus institutions (p = 0.005). In the WP-adjusted league table, there was still a significant difference in rankings between the different mission groups, although the effect was smaller (H = 5.77, d.f. = 3, p < 0.001). However, in the WP-adjusted league table there was no significant difference between the Russell Group and University Alliance (p = 1), indicating the revised league table rewarded different mission groups (2) WP − adjusted Guardian score = ((1 − X) * Guardian Score) + (X * Gini Normalised Score) more equitably. There were still significant differences between both of these mission groups and the MillionPlus group (Russell Group p < 0.001; University Alliance p < 0.001). For those interested in the outcomes of individual institutions in our analysis, we present annotated versions of Figures 2B+D as Figure 3.Recognising that institutional behaviours are influenced through individual positions in ULTs, we extended the analysis to individual HEIs. We found that including WP points as a metric had a direct impact on the ranking of institutions in the adjusted league table (Table 4).
Thirty-three institutions rose more than 10 places in the league table, and 34 fell more than 10 places. While our institutional inclusion criteria make rank comparisons between league tables somewhat imprecise, the most striking risers in the table were Chester (+36), and Worcester (+31), while the most significant fallers were Aberdeen (-44), Edinburgh (-35) and King's College London (-35). There was no impact of including WP as a metric on the top 5 institutions in the table, however the rest of the table was reordered once WP had been incorporated (Table 5). For example, Coventry University rose from 13th to 6th position once after inclusion of WP.   1 3 Levelling the playing field: The effect of including widening…

A strong negative relationship between most league table metrics and WP
Our analysis indicates that a significant proportion of the existing UK league table metrics are positively correlated with Gini coefficients, meaning that these metrics have a negative relationship with WP. We found a significant correlation between improved ULT performance and decreased diversity in their student body for 75% of the weighted total of the Guardian ULT, and for 78% of the weighted Complete University Guide ULT. The strongest correlation in both ULTs turned out to be between entry tariffs and Gini coefficient. Students from relatively disadvantaged backgrounds have been shown to perform equally strongly as more advantaged peers when admitted with lower entry tariffs (Hoare and Johnston 2011;Mountford-Zimdars et al. 2015). At the same time, admitting these students negatively impacts a HEI's ULT performance. This indicates a clear contradiction in that laudable efforts of "good" HEIs seeking to improve both WP and ULT performance by opening up to disadvantaged-yet-talented students actually seem to decrease ULT performance.

A negative correlation between WP and the Value-Added metric in the Guardian ULT
The most unexpected relationship we found in our analysis was the negative relationship between WP and the Value-Added metric in the Guardian ULT (correlation with Gini = 0.37, p < 0.001). Given that Value Added is defined in terms of students with lower entry grades attaining a first or upper second-class honours degree (see footnote 9), it might be expected that this indicator would reward universities with more diverse student cohorts. However, our analysis indicates that Value Added is not a sufficiently good metric to directly reward institutions with a commitment to WP. The reasons for the negative relationship revealed in our analysis are unclear.
For HEIs admitting lower-tariff students, high "value added" may mean the difference between a student failing or graduating with a 2.2 (a lower second-class honours degree). However, this would not be recognised in the Value-Added measure in the league table. Given the close correlation between entry tariffs and Gini coefficients, diverse HEIs giving lower tariff students a chance may be punished by this measure if they "only" improve disadvantaged students to a 2.2 level, compared to a higher-tariff, less diverse HEI taking potential 2.2 students and adding the small amount of value required for these students to reach a 2.1 grade. While Value Added is a useful metric, our analysis indicates it is still negatively related to WP, so does not provide a mechanism for institutions with diverse student cohorts to be recognised in the league tables. A further result of interest is that one ULT metric, namely NSS scores for Assessment and Feedback, does correlate positively with more diverse student intake. Again, the reasons for this are unclear, and understanding how student expectations and differing pedagogies between different HEIs interact to produce this result would require further exploration beyond the quantitative data approach we have taken here.

Possible explanations for the relationships observed
Although we have determined the generally negative relationships between the various league table metrics and WP, the data presented do not explain why such correlations exist. We anticipate the reasons for the relationships between WP and the individual league table metrics likely to be complex and different for the various metrics. For example, degree completion may be driven less by the HEI and more by the students' support network, such as friends, family and their fellow students (Wilcox et al. 2005). It may be that most students enrolled in universities that recruit from high participation areas expect to graduate, resulting in an environment where non-completion is seen as unusual. Another factor is that students from lower participation backgrounds are more likely to combine studying with paid employment or family care responsibilities; this may impact continuation and completion (Leese 2010;Reay et al. 2010). Students from underrepresented backgrounds are also more likely to fall victim to the "hidden curriculum" of HE, i.e. the norms, expectations and language of university study they are less familiar with than students from privileged backgrounds (Margolis 2002), which may underpin some of the relationships observed.
However, it is less easy to see a direct causal relationship between the diversity of the student body and indicators such as Research Quality or Facilities Spend. Here there may be a more complex set of causalities; institutions that prioritise these activities are rewarded in the league tables, therefore gain prestige, become more attractive to applicants, therefore can set higher entry grades, and thus are less likely to be attended by those from disadvantaged backgrounds.

Levelling the playing field
We have demonstrated that WP can be incorporated into ULT calculations relatively simply using readily available data, and that this can reward institutions with a diverse student intake through improved rankings. Inclusion of a calibrated WP metric mitigates the bias against WP in the current league table calculations we identified. Inclusion of a WP metric also has the effect of reducing some of the differences between university mission groups (particularly between the Russell Group and the University Alliance), somewhat levelling the playing field between universities with different institutional priorities.
In our context, the Russell Group does not place an emphasis on WP in their mission statement, whereas the MillionPlus and University Alliance groups both actively emphasise a WP component. In our revised league table, there is a positive relationship between WP and ULT performance (negative correlation with Gini coefficient) for all mission groups except the Russell Group, mirroring the emphasis on WP in the mission statements of the other groups. Our league table therefore allows universities to be recognised for their WP activity, with the league table better reflecting the extent to which institutions are successful in their specified WP mission. We believe this is a fairer way for institutions to be ranked, which allows a wider variety of types of "excellence" to be reflected in institutional prestige.
The stated aim of ULTs is to inform the choice of university applicants. Applicant characteristics have changed over time, and it is clear that Generation Z 15 is much more involved with the social justice agenda than generations born in the late 20th century (Mohr and Mohr 2017). A recent report indicates that 72% of UK students think that universities should take educational background into consideration when making admissions decisions, and half think that those from deprived areas should be allowed in with lower grades than their more advantaged peers (Dale-Harris 1 3 2019). There is therefore a desire for "fairness" amongst current university applicants, and direct inclusion of WP in a transparent way through ULTs may influence student decision-making. The same marketisation forces that have driven the adoption of the NSS as a metric have also encouraged research into the Student Experience which suggests that today's students are seeking different things from their time in HE (Seemiller and Grace 2017). Inclusion of WP in league tables would turn an institution's often hidden WP efforts into a more transparent metric when it comes to student choice.
Existing academic literature is very clear that, within the specific context of HE in the UK, ULTs influence institutional decision-making through the mechanism of key performance indicators (KPIs) 16 (Hazelkorn 2007;Hosier and Hoolash 2019;Locke et al. 2008;Lynch 2015). We have demonstrated that universities fulfilling their legal and societal duty to widen participation are actively penalised by existing ULT indicators. For example, increasing entry requirements would increase ULT standing in both the Guardian and Complete University Guide ULTs, while at the same time excluding the kind of students from low participation areas who currently excel after being enabled to enrol by reduced-tariff offers (Hoare and Johnston 2011). This has serious market consequences, which in turn exert incentives to diminish WP efforts. We anticipate that including a WP metric in ULTs would alter institutional decisionmaking; once WP is included in the league tables it may be more likely to be incorporated into institutional KPIs and result in more meaningful WP activity. Pursuit of such a metric would then be rewarded by reputational prestige.

Limitations
It should be noted that our modelling approach is based on using the POLARbased Gini coefficient as a measure of WP. As mentioned earlier, we selected this measure because Gini coefficients can be generated from publicly available data for the majority of UK universities, and use an established robust methodology (Gini 1955(Gini [1912; Martin 2018). However, Gini coefficients may not be the most appropriate WP measure to be included in a WP-adjusted league table. POLAR has been criticised as a methodology because it is postcode-based, which makes it likely to underestimate the heterogeneity of HE participation within a geographical area, particularly in rural parts of the UK (HEFCE 2014).
Assessing WP on the basis of geography also obscures the impact of other factors influencing HE participation, such as ethnicity or socio-economic status. The Scottish government uses an alternative measure within Scotland's HE sector, comparing areas across 7 different domains of deprivation, including education, health and income, to produce the Scottish Index of Multiple Deprivation (Scottish Government 2020). This and similar metrics can allow a more detailed breakdown of how 1 3 Levelling the playing field: The effect of including widening… specific deprivations affect HE intake, but still miss factors that affect individuals rather than geographic areas, such as ethnicity.
UCAS has developed the Multiple Equality Measure (MEM; see footnote 10) as a more sophisticated predictor of higher education participation, which includes ethnicity, school type and whether students receive free school meals (a marker of low socio-economic status), the geographic information provided by POLAR and a number of other predictors (UCAS 2018b). Since this intersectional measure of HE participation is calculated at the level of the individual rather than considering postcode, it allows for a more nuanced assessment of WP. However, these data are not currently publicly available by institution, hence our reliance on the POLARbased Gini coefficients for our modelling approach. We would encourage league table providers to work with UCAS to incorporate MEM as a league table metric if possible. It should also be noted that for the sake of simplicity, our modelling adopts a "Divide by the largest value" approach to normalisation (Tofallis 2012), which contrasts with the S-score methodology 17 used for other metrics in the Guardian ULT (Hiely-Rayner 2019). There are a variety of methods used to normalise league table metrics. Tofallis (2012) highlights four different methodologies used, and notes that the normalisation method chosen can have significant impacts on the subsequent ranking of institutions. We do not advocate our modelling approach as the most statistically robust method of league table calculation, but present it as a proof of concept that WP can and should be included in ULTs.

Conclusions and recommendations
The compilation of university league tables is an inherently political act, with the choice and weighting of metrics resulting in particular characteristics of individual institutions being rewarded or penalised. One aspect of contemporary universities that is currently not considered by league tables is the diversity of the student intake, and the extent to which an institution has been successful in widening participation in HE. Taking the UK as a case study for ULT compilation, our analysis demonstrates the following: (1) The existing ULT methodologies penalise institutions that successfully widen access to HE. We found 75% of the Guardian and 78% of the Complete University Guide metrics to have a negative relationship with WP, which includes the Guardian's "Value Added" metric which might be expected to reward institutions with a more diverse student body. (2) It is possible to include WP as a league table indicator in its own right. We have demonstrated this through the use of Gini coefficients based on publicly available POLAR data, but other as-yet unpublished indicators of WP may be more powerful tools.
(3) Including WP as a league table metric removes the negative sector-level relationship between WP and league table position. This also reduces the difference in ranking between institutions of different stated mission groups, and therefore levels the playing field between diverse institutions.
We find that, overwhelmingly, current ULT performance correlates negatively with a WP agenda. There is therefore little incentive for institutions to prioritise WP when it comes to league table performance. If anything, we find that a commitment to WP damages league table position, resulting in a direct conflict between increasing student diversity and institutional prestige (Shaw 2009). However, we also demonstrate that it is possible to adjust league table methodologies to include WP, and therefore reward institutions which actively recruit from a diverse student population. We therefore call upon league table compilers to revisit their conception of a "good university", that fully reflects the diversity of institution types and intended missions. The fact that two of the three league tables are compiled by newspapers has implications for bias in ULT compilation; journalists in the UK are highly likely to have attended an elite university and been educated at a private school (Sutton Trust 2019), so may have a narrower conception of a "good" university than the HE sector as a whole. In the UK, given the legal requirements for universities to widen access to HE, we believe it is appropriate and fair for WP to be included in league table metrics. In this study, using publicly available data and existing robust methodologies, we demonstrate that inclusion of WP is not only possible but also effective in terms of a fairer reflection in ULTs of HEIs' efforts in widening participation.
Rewarding universities' social justice activities through improved league table positioning requires ULT compilers to make an active choice to do so. This necessitates reconsidering existing metrics for potential bias, and considering the adoption of new metrics to capture diversity and participation. While the UK has a clear statutory obligation to pursue WP as a policy objective in HE, we recognise that the legal situation in other nations may not stipulate this objective. It might therefore be technically challenging or less appropriate to incorporate WP-based metrics into other national or international league tables. Nevertheless, there are two very important technical lessons which international readers might draw from our analysis. First, that intuitively appealing data (in our case, the "Value Added" score) might not actually reward the behaviours intended (WP rates). Second, that university rankings, both in the activities they measure and reward, and their methodologies, can be challenged rigorously if academics are prepared to "get their hands dirty" and propose something better.
We call upon the academic community to actively lobby for changes to ULT compilation, rather than passively accepting a set of metrics that do not adequately value the diversity of the sector. However flawed they are, league tables and rankings currently drive institutional behaviours and student choice. There is a collective responsibility to ensure that the indicators in ULTs at least attempt to describe all the diverse characteristics of a quality university, which we believe in the UK context should include widening participation.