Despite being a key component of how identity is constructed in modernity [1], national identity is a contested construct. A distinction can be drawn between ethnic and civic ideas of national identity (e.g. Zimmer [2]). Ethnic, or organic, conceptions are based upon a shared cultural inheritance: language, social practices and, in some manifestations, ancestry. Here, national identities are not something that can be chosen, and are instead determined. Civic or creedal understandings of national identity emphasise instead a voluntary commitment to a nation, often in its sense as a community of shared values [1]—identity is something that can and should be chosen. In practice however, this distinction is often more ambiguous. Yack [3] juxtaposes the apparently civic Canadian and ethnic Québécois identities, before exploring how a Québécois identity must ultimately be chosen above an equally plausible Canadian identity and how a Canadian identity is contingent on historical connections with colonial Britain and France.

Similarly, in Wales, a constituent nation of the UK, all citizens have at least two plausible national identities: Welsh and British. Furthermore, according to the 2011 UK Census, 20.8% of residents of Wales were born in England and a further 5.5% were born outside the UK [4], suggesting other possible identities. In practice, 65.9% of residents of Wales identify as Welsh, with 57.5% identifying as Welsh-only. Significant minorities, however, identify as British, English or a variety of other nationalities. Furthermore, a wide range of hybrid identities also exist, the most common of which is Welsh and British (7.1% of the population).

As with most national identities, national identity in Wales has both ethnic and civic conceptions. Historically, Welsh identity, particularly in the context of Welsh-speaking communities, was framed by figures as celebrated as John Stuart Mill as backwards and ethnic, while an Anglophone British identity was framed as forward-looking and civic [5]. However, both identities had clear ethnic components: Welsh identity being associated with the Welsh language and Methodist chapels, and British identity being associated with the English language and the Anglican Church [6]. Following a narrow vote in 1999 in favour of devolving some political powers to Wales, Brooks argues that Welsh politicians have sought to strengthen an explicitly civic and inclusive Welsh identity but, in doing so, have implicitly problematised some of the political demands of Welsh language communities as ethnic and exclusive [7]. To further complicate this issue, there is evidence that some incomers to Welsh-speaking Wales see learning Welsh as a civic act, instead of the usual view of language as an ethnic characteristic [8].

Research into Welsh identity reveals that Welsh identity is not monolithic. This research has a striking focus on place, with Welsh identity being constructed differently across the country [9]. One of the most influential models of these Welsh identities is Balsom’s Three-Wales Model [10], developed in political science. The model divides Wales into three: Welsh Wales, Y Fro Gymraeg (‘The Welsh-speaking country’) and British Wales. Welsh Wales covers post-industrial South Wales, particularly the coalfield that was crucial to the area’s rapid development during the industrial revolution. This identity is strongly working class, generally Anglophone, and associated with voting for the Labour Party. Y Fro Gymraeg covers much of the rural west of Wales, where the Welsh language is a living community language. Here, Welsh identity is tied to speaking Welsh and, in political terms, to support for Plaid Cymru, the Welsh nationalist party who, in the 2016 Welsh Assembly elections, won all five constituencies in Y Fro Gymraeg but only one outside it. British Wales covers the remainder of the country and, along with support for the Conservative and Liberal Democrat parties, is associated with a less confident Welsh identity [11] relative to the previous two areas, which have competing claims to heartland status—Wales’ ‘two truths’ as Raymond Williams put it [12]. Thus, in Wales, competing forms of Welshness sit alongside each other, as well as British, English and other identities.

The Three-Wales Model itself, although an enduringly useful shorthand for the various Welsh identities, has a number of limitations. Firstly, it is a model of places and not people. But, for example, Welsh speakers exist outside of Y Fro Gymraeg and the model is not clear on what to make of them. Furthermore, as Scully and Jones [13] point out, the majority of British identifiers live outside of ‘British Wales’ and only a minority in British Wales identify as British. Secondly, in proposing these regionally dominant identities, the model does not concern itself with groups that are not large enough to be a local majority. What to make of the ~ 15% of Welsh residents who do not express Welsh or British identities?

Welsh identity has been studied in its own right [9, 11] and from the perspective of electoral politics [10, 13], but, to date, not from the perspective of public health. However, there is ample reason to expect health disparities as a function of national identity in Wales. Firstly, the various Welsh identities proposed in the Three-Wales Model are highly classed, with working classness integral to Welsh Wales identity [11], while it has a more complex relationship with Welsh-speaking Welsh identities [14]. Social gradients are the rule rather than the exception in health [15], so those holding such classed identities may be at greater risk of poor health. Secondly, and relatedly, the various Welsh identities are strongly associated with particular geographical areas. Welsh Wales is strongly tied up with South Wales’ heritage of heavy industry, particularly coal mining, and poorer health in former coalfields has also been documented elsewhere [16, 17]. Y Fro Gymraeg, conversely, is predominantly rural. Rurality has been associated with better mental health in the UK [18] but may also lead to adverse consequences due to poorer access to healthcare [19]. Large geographical disparities also exist in ecological social capital, the presence of and access to resources embedded in social networks in a given locality, and these have also been associated with health outcomes [20, 21]. Thirdly, as described above, even explicitly civic national identities are often related to cultural, linguistic and other ethnic characteristics and such characteristics are often related to population health [22,23,24]. Fourthly, voluntary civic-type identities can also be related to health status [25].

The present paper will compare the self-rated general health and mental health of different national identity groups in Wales using a two-stage analysis of nationwide survey data. Data on national identities, Welsh language ability, ethnicity and area of residence will be clustered to identify a set of identity groups. These groups will then be compared in terms of general and mental health with and without adjusting for a number of demographic and geographical risk factors.


Ethics and Data Access

Ethical permission was obtained from the Bangor University School of Psychology Ethics Committee on September 14, 2020.


The present study used the 2017–2018 [26] and 2018–2019 [27] waves of the National Survey for Wales (NSfW; N = 11,381 and 11,922, respectively). The NSfW is a cross-sectional face-to-face survey looking at a variety of topics, including healthcare use, arts participation, diet, alcohol use and knowledge of devolution; run by the Welsh Government. Potential respondents are sampled from randomly chosen households using postal address files, aiming to be representative of 16+-year olds living in residential households in Wales. Response rates for 2017–2018 and 2018–2019 were 55% and 54%, respectively.

Survey data were linked to area measures of poverty and population density at the lower super output area (LSOA) level, using respondent LSOA codes provided by the Welsh Government under a data access agreement. Quintile of the Welsh Index of Multiple Deprivation (WIMD; Welsh Government [28]) was used as a measure of poverty, and population density was obtained from the 2011 UK Census [4].


The variables used to identify the different identity groups were as follows: frequency of speaking Welsh (responses: daily, weekly, less often, never and cannot speak Welsh). This variable was created by combining an item asking about ability and a second item on frequency of use (asked of those who reported being able to speak Welsh), local authority (county) of residence, whether respondents identified as Welsh, whether respondents identified as British, whether respondents identified as English (the last three items were in response to a question where respondents could select as many identities as they wished, so were non-exclusive) and self-reported ethnicity (using an adaptation of the five group system used in the 2011 UK Census: White Welsh/English/Scottish/Northern Irish/British, White other, Mixed, Asian, Black and Other; White other and White Welsh/English/Scottish/Northern Irish/British were separated as they seemed likely to differ in terms of national identity.

For the analyses of health outcomes, the following variables were used: gender, age (grouped into seven bins: 16–24, 25–34, 35–44, 45–54, 55–64, 65–74, 75+), education (higher degree/postgraduate qualifications, first degree, A/AS levels, diplomas, etc.; O level/GCSE grades A–C, etc.; O level/GCSE grades D–G; other qualifications; trade apprenticeships; foreign qualifications; no qualifications), self-reported income (less than £10,400 a year, £10,400 to £20,799 a year, £20,800 to £31,099 a year, £31,100 to £41,499 a year, £41,500 or more a year), material deprivation (a binary measure of whether a respondent is materially deprived, based on their responses to a number of questions about whether or not they can afford various items; further details available from, WIMD quintile of their LSOA of residence, population density of their LSOA of residence, a question measuring perceived financial pressure (“Which one of the statements on this card best describes how well you [and your family/and your partner] are keeping up with your bills and credit commitments at the moment?”, with the following response options: “Keeping up with all bills and commitments without any difficulties”, “Keeping up with all bills and commitments but it is a struggle from time to time”, “Keeping up with all bills and commitments but it is a constant struggle”, “Falling behind with some bills or credit commitments”, “Having real financial problems and have fallen behind with many bills or credit commitments” and “Have no bills”) and which region of the Three-Wales Model. Respondents lived in Y Fro Gymraeg (Ynys Môn, Gwynedd, Ceredigion and Carmarthenshire), Welsh Wales (Swansea, Neath Port Talbot, Rhondda Cynon Taf, Merthyr Tydfil, Caerphilly, Blaenau Gwent, Torfaen) and British Wales (Conwy, Denbighshire, Flintshire, Pembrokeshire, Wrexham, Powys, Monmouthshire, Newport, Vale of Glamorgan, Bridgend and Cardiff). Note that Balsom’s original delineation was on the basis of UK Parliamentary constituencies in the 1980s; thus, drawing the boundaries on the basis of local authorities may lead to some small differences, but these are very minor.

Two domains of health were analysed. First, self-reported general health was measured using the item “How is your health in general; is it…”, with the following response options: “Very good”, “Good”, “Fair”, “Bad” and “Very bad”. Following previous studies [29,30,31], the two ‘good’ responses were coded as zero and the other three categories were coded as one, so models estimated the risk of ‘not good’ health. Second, mental health was measured using the item “Do you have any physical or mental health conditions or illnesses lasting or expected to last for 12 months or more?”. Respondents whose responses were coded under the category “mental disorders” were scored as 1, and all other were scored 0.


Latent Class Analysis

There were two stages of analysis. Firstly, latent class analysis was used to divide respondents into the appropriate number of identity groups; secondly, the self-reported general health and mental health of these groups were compared, both crudely and after adjusting for a number of different factors, detailed below. All analyses were run in R [32].

Before latent class analysis, missing data were imputed using the Amelia package [33] for R. All variables to be used in the health analyses below, as well as the three national identity variables (ethnicity, frequency of speaking Welsh and local authority), were included as predictors. A single version of the imputed data was then used for the latent class analysis. A full multiple imputation is carried out before the health analyses, as to incorporate the output of the latent class analysis.

Latent class analysis was run using the PoLCA function in the R package of the same name [34]. Solutions including between one and nine groups were fitted to the data. Each solution was allowed up to 100,000 iterations and was fitted with twenty different start points to avoid local minima.

Model selection can be difficult in latent class analysis. Simulation studies recommend Bayesian information criteria (BIC), and bootstrapped likelihood tests are the best approach to determine the best fitting model [35, 36]. However, these studies had sample sizes of only 1000 in the largest simulated samples, which is an order of magnitude lower than used here. Even assuming the same underlying structure, larger sample sizes will lead to a greater number of classes [37]. Thus here, BIC was plotted in the style of a scree plot, to identify the point of inflection, after which adding additional classes does not lead to substantial changes in BIC. However, models after this point will also be examined to see how they differ, to ensure that the model space is well understood.

Analyses of Health

Prior to the health analyses, missing data were multiply imputed using the Amelia package [33] for R. This step was carried out after latent class analysis, ensuring that the latent class was present in the imputation model to avoid bias. Five imputations were made of missing data, using gender, education, material deprivation, dichotomised general health, mental health, ethnicity, Three-Wales Region, financial stress and the latent class variable as nominal variables; income band, WIMD quintile and age group as ordinal variables; and sampling weight and population density as numeric variables.

The health of the resulting groups was compared using a series of linear mixed-effects models. The strategy was to fit a series of models with increasing levels of adjustment for health-relevant individual and area-level factors, in order to describe the health disparities as they appear in the population but also to explore the extent to which any disparities can be explained by age, class and geography. The goal here is not to infer causal relationships between these factors and health, but to compare groups unconditionally and conditionally on some obvious health-relevant factors.

For each dependent variable, general health and mental health, seven binomial log-linked mixed-effects models were fitted. Model 1 included only the latent class analysis derived group, plus a random intercept of LSOA nested within local authority. Model 2 was as model 1 but added gender and age group. Model 3 was as model 2 but added education and income, and model 4 was as model 3 but added area-level poverty. Model 5 was as model 4 but added area-level population density (z-scored). Model 6 was as model 5 but added the Three-Wales Model region of respondents. Model 7 was as model 6 but added perceived financial pressure and material deprivation, added last as these represented respondents’ interpretations of their financial situation and so were more vulnerable to reverse causation issues with mental health. In all models, residuals were weighted by the provided sampling weights. Collinearity was assessed using the check_collinearity function of the performance package [38] for R.

In order to account for the multiple imputation, five iterations of each model were run, one for each imputation, and the point estimates and standard errors were pooled using Rubin’s rule [39], as implemented in the mi.meld function of the Amelia package.


Missing Data

From a total 23,303 respondents to the 2017–2018 and 2018–2019 surveys, 16,764 respondents had all variables available. Seven respondents lacked data on gender, 216 lacked education data, 6195 lacked data on income (note that the question on income was introduced partway through the 2017–2018 fieldwork period. Data were missing for 3051 respondents interviewed during the period when the question was asked), 192 were missing data on material deprivation, 280 lacked data on financial pressure, 24 lacked ethnicity data, eight lacked data on Welsh language, 59 lacked data on general health and 162 lacked data on mental health. Missing data were imputed as described above.

Latent Class Analysis

Figure 1 shows a scree plot of BIC for the nine fitted models. BIC is lowest for a seven-class solution, but after model 5, BIC plateaus markedly. Thus, the main analyses reported will be for the five-class model.

Fig. 1
figure 1

Bayesian information criteria for the nine latent class analyses. BICs decline steeply from models 1 to 5 and then plateau, although the lowest score is for model 7

The five-class model is summarised in Table 1. In descending order of share of population share, they are as follows: Anglophone Welsh, British, Cymry Cymraeg (Welsh-speaking Welsh), English and Ethnically Diverse. See Fig. 2 for the distribution of each group between local authorities.

Table 1 Composition of each identity group in terms of variables used in latent class analysis and in analyses of health
Fig. 2
figure 2

Proportions of each local authority that each identity group makes up

Class 1: Anglophone Welsh

This was the largest group at 44% of the sample. They identified as Welsh, with a minority also identified as British. Although ~ 23% reported being able to speak Welsh, most of these reported speaking it infrequently. This group was concentrated in the more urban south but was widespread outside of the northwest (see Fig. 2).

Class 2: British

This was the second largest group at 28% of the sample. They identified as British, with small minorities also identifying as Welsh or English. They were the second most likely group to speak Welsh but were still largely monoglot Anglophone. They were concentrated in the counties of Powys and Monmouthshire and rare in the South Wales Valleys.

Class 3: Cymry Cymraeg

This was the third largest group at 12% of the sample. They generally identified as Welsh, and not British or English. Most (73%) spoke Welsh daily. They were concentrated in ‘Y Fro Gymraeg’: Ynys Môn, Gwynedd, Ceredigion and Carmarthenshire, particularly Gwynedd.

Class 4: English

This was the next largest group at 11% of the sample. Most identified as English and a few as British or Welsh. Very few reported speaking Welsh. This group was concentrated in the counties of Conwy, Denbighshire and Flintshire along the north coast.

Class 5: Ethnically Diverse

This was the smallest group, making up 4% of the sample. About a quarter identified as British, but few identified as Welsh or English. Very few reported speaking Welsh. Unlike the other four groups, which overwhelmingly reported ‘White Welsh/English/Scottish/Northern Irish/British’ ethnicity, this group reported a wide range of ethnicities. About 45% were White other. The next largest cluster (~ 24%) reported Asian ethnicities, with the remainder reporting Black, Arab or other ethnicities. They were highly concentrated in Cardiff, with smaller pockets in Swansea, Newport, Wrexham and Flintshire.

Differences with Alternative Models

In the six- and seven-class models, a small ‘Cymrophone British’ class emerged, drawing from the Cymry Cymraeg and British groups. In the seven-class solution, the Ethnically Diverse group was divided into a group which was predominantly ‘White other’ in ethnicity and which did not identify as Welsh, British or English and a group which was largely people of colour, half of whom identified as British.

Group Demographics

Table 1 also compares the groups on demographic, socio-economic and geographical factors. On age, two groups stood out: the Ethnically Diverse, who were younger than other groups, and the English, who were older. On education, the Ethnically Diverse had a strikingly high proportion of respondents with higher degrees and high rates of first degrees and foreign qualifications. The English and the Anglophone Welsh, in contrast, had lower rates of degree qualifications and higher rates of no qualifications. On income, the British and the Cymry Cymraeg were less likely to be in the lowest income bracket, while the British were more likely to be in the highest bracket, and the Ethnically Diverse group were over-represented at both extremes of the income scale. On material deprivation, the Cymry Cymraeg were less likely to be materially deprived while the Ethnically Diverse were more likely. There were no major differences in perceived financial pressure, except for a tendency for the Cymry Cymraeg to being more likely to report not having bills.

Geographically, the Ethnically Diverse and the Anglophone Welsh were over-represented in the most deprived quintile of LSOAs, while the Cymry Cymraeg were under-represented, and the British were over-represented in the least deprived quintile. The Cymry Cymraeg group was much more likely to live in the lowest density quintile of LSOAs, while the Ethnically Diverse group was less likely, with the reverse being the case for the highest density LSOAs.

Analyses of Health

General Health

Table 2 and Fig. 3 display odds ratios (ORs) for reporting ‘not good’ health in the various models. Unadjusted, there are striking disparities in self-reported health: The Cymry Cymraeg, British and Ethnically Diverse groups had much lower rates of not good health than the Anglophone Welsh (reference) group, with the English looking similar to the Anglophone Welsh. These disparities were slightly attenuated in the adjusted models, with adjusting for age accounting for some of the reduced risk of the Ethnically Diverse group and the increased risk of the English, while accounting for education and income accounting for some of the reduced risk of the British and Cymry Cymraeg. Generally, however, risk remained lower for the British, Cymry Cymraeg and Ethnically Diverse groups, with the confidence intervals for the British group’s OR overlapping with 1 in some models, but the ORs for the Ethnically Diverse and Cymry Cymraeg groups remained comfortably below 1 across all models.

Table 2 Odds ratios and 95% confidence intervals for each term in each of the seven models for general health
Fig. 3
figure 3

Odds ratios, with 95% confidence intervals, for reporting ‘not good’ general health for each group, relative to the Anglophone Welsh group, across all seven models

Mental Health

Table 3 and Fig. 4 display ORs for reporting a mental health problem in the seven models. Again, the Cymry Cymraeg and, especially, the Ethnically Diverse groups had reduced risks in all models. The English had reduced risk in the unadjusted model, but adjusting for age gave them a similar risk to the Anglophone Welsh. The British had a decreased risk in model 2 only, suggesting their risk was low, considering their age and gender profile, but this was accounted for by the socio-economic variables.

Table 3 Odds ratios and 95% confidence intervals for each term in each of the seven models for mental health
Fig. 4
figure 4

Odds ratios, with 95% confidence intervals, for reporting a mental health problem for each group, relative to the Anglophone Welsh group, across all seven models

Post hoc Analyses

Given that the Ethnically Diverse group might hide heterogeneity between different ethnic groups, modified versions of model 1 for general and mental health were fitted to just the members of this group, with fixed effects of self-reported ethnicity replacing those of group. As the largest group, White other was used as the reference category.

General Health

With the exception of the Mixed group, all OR confidence intervals overlapped with 1.00 ORs for the Asian (1.24, 0.78–1.97), Black (OR = 1.05, 1.51–2.15), White Welsh/English/Scottish/Northern Irish/British (OR = 1.54, 0.64–3.71) and Other (OR = 1.31, 0.73–2.34) respondents in the Ethnically Diverse group trended towards poorer health, while the Mixed ethnicity respondents reported strikingly poorer health (OR = 4.60, 1.08–19.56).

Mental Health

Asian members of the Ethnically Diverse group reported better mental health than the reference group (OR = 0.25, 0.07–0.90). Point estimates of risk were lower for Black respondents, but CIs overlapped substantially with 1 (OR = 0.35, 0.04–2.90). Mixed (OR = 2.42, 0.04–126.45), White Welsh/English/Scottish/Northern Irish/British (OR = 2.98, 0.50–17.78) and Other (OR = 1.64, 0.49–5.47) respondents had lower point estimates for mental health risk, but again, all CIs overlapped substantially with 1.

It should be noted that sample sizes within this analysis were very low, making it very plausible that some of these non-significant associations are false negatives. It should also be noted that the majority of Mixed ethnicity respondents were assigned to groups other than the Ethnically Diverse group by the latent class analysis, and those respondents in this group should not be assumed to be representative of Mixed ethnicity people in Wales.


The present paper demonstrates striking health disparities between the various national identity groups of Wales, which are not explained by obvious socio-demographic or geographic differences between the groups.

Latent class analysis identified five identity groups in Wales. Three map approximately, with caveats, onto Balsom’s Three-Wales Model, while two are novel. The Anglophone Welsh are a much broader group than Balsom’s Welsh Wales. They comprise a plurality across most of the country, including in counties that Balsom identified as British Wales, such as Wrexham and Pembrokeshire. Despite this broader conception, they remained educationally disadvantaged compared to the Cymry Cymraeg, Ethnically Diverse and British, although not the English, and more likely to be materially deprived than the Cymry Cymraeg and British, although at lower risk than the Ethnically Diverse.

The British were distributed quite differently to Balsom’s model, with the distribution looking strikingly similar to that of the English, but more in rural and less deprived areas. Although they were geographically co-located with the English, they tended to have higher levels of education and were more likely to have higher levels of incomes. It is tempting to speculate that both groups represent English-born residents of Wales, with the distinction between them being largely one of socio-economic classes, echoing previous work showing a class dimension to English/British identification [40]. However, given that these two groups represented nearly 40% of the sample and the UK Census suggested that only 20.8% of the Welsh population was born in England (and many of them identify as Welsh), this does not seem likely. Although English-born residents were more likely to identify in the census as British (including combinations like Welsh and British) than Welsh-born residents (41% compared to 21%), the majority of people identifying as British are Welsh-born.

The Cymry Cymraeg closely resembled Y Fro Gymraeg from the Three-Wales Model, characterised by their use of the Welsh language, tendency to live in the Y Fro Gymraeg counties (although not exclusively) and identification as Welsh. This group had greater rates of higher pay and advanced qualifications than the Anglophone Welsh. They also tended not to live in deprived areas. It should be noted that the Welsh language variable in the latent class analysis measured use of Welsh, rather than just the ability to speak it. This likely led to a smaller and more specific Cymry Cymraeg group than a more inclusive definition based on ability, with substantial minorities of those able to speak Welsh falling into the Anglophone Welsh and British groups. This paper, thus, perhaps takes the implicit stance that a language’s importance to identity comes from its practice, rather than simply ability.

The English were not featured in the Three-Wales Model, but, as mentioned above, they are clearly distinct from the British in terms of national identity and socio-economic status. As mentioned above, they looked somewhat like the British in terms of geographical distribution, albeit in more deprived LSOAs and more focused on the north coast than the British. They were the oldest group and were relatively financially disadvantaged. Data were not available on country of birth, but it is likely that most of this group was born in England and migrated to Wales.

Finally, the Ethnically Diverse group was the second addition. As well as the obvious heterogeneity in ethnicity, they also, strikingly, had the highest proportion of any group in both the highest and lowest income bands. There were, however, commonalities. They were by some distance the most highly educated, youngest and most urban group. While the English would likely been part of British Wales in the Three-Wales Model, the Ethnically Diverse group feel like an entirely novel addition. It is worth reiterating that this group should not be read as a ‘people of colour group’: The plurality of the group fall under the ‘White Other’ group in the classification system used in UK official data.

The ‘Five-Wales model’ used here and Balsom’s Three-Wales Model have clear similarities, but also important differences. Firstly, although the present model incorporates geographical information, it is people, rather than areas, that are classified. The Three-Wales Model conversely classifies regions, treating their residents as monolithic groups. Thus, not only is the present model is thus able to identify less populous groups, but it is also less vulnerable to the ecological fallacy when used in practice. Secondly, the present approach employs a data-driven approach to classification, while the Three-Wales Model draws on an attempt to synthesise what Raymond Williams called Wales’ ‘two truths’ [12]: the industrial labourist tradition of South Wales and the Welsh-speaking culture of Y Fro Gymraeg. While the latter certainly captures something of the ways that Wales has been depicted culturally and artistically, the present approach is perhaps more suitable for empirical research.

In terms of general health, the Anglophone Welsh and English had the worst outcomes, with the British trending towards slightly better health, depending on the model in question, and the Cymry Cymraeg and Ethnically Diverse groups reporting much better outcomes. For mental health, similar results were found, except that the British looked more similar to the Anglophone Welsh, and the reduced risk in the Ethnically Diverse group was even more pronounced.

Possible Mechanisms

As outlined in the “Introduction”, there were clear reasons to expect that the Anglophone Welsh group would have poorer health than the other groups. That being said, as this group was far broader than its more valleys-focused counterpart in Balsom’s Three-Wales Model, explanations based solely on post-industrial health risks are likely insufficient. Models 3–7 accounted for various components of socio-economic status and large health disparities largely endured. Although it is likely that some of the remaining disparities represented residual confounding, the remaining disparities are large enough that other factors are likely also responsible. Indeed, the associations between the identity groups and health remain reasonably stable after model 3, when education and income are included, even when other measures of poverty are added, which seems inconsistent with an explanation solely based on residual confounding by socio-economic status.

The Cymry Cymraeg group had, as expected, lower rates of health and mental health problems than the Anglophone Welsh group, results that are only partly explained by differences in socio-economic status. The results are reminiscent of work comparing the Finnish-speaking majority and Swedish-speaking minority in Finland [41], where the better health of Swedish minority is partly attributable to greater social capital. This is also a possibility in Wales, where geographical variability in social capital favours some of the rural areas where Welsh speakers predominantly live, and is particularly low in the South Wales Valleys [20, 21]. More broadly, degree of cultural assimilation or cultural distinctiveness has been shown to be related to health in Japanese-Americans [23] and, potentially, these differences represent something similar.

The group with the lowest levels of poor health was the Ethnically Diverse. This is perhaps a surprising finding, given that in the broad literature on ethnic health disparities, it is more common for such disparities to favour ethnic majorities (e.g. [42]). It is also important to highlight that other work in Wales has found health disparities to the detriment of people of colour, such as during the COVID-19 pandemic [43]. However, it is again important to remember that the plurality of this group fell into the ‘White other’ category. Post hoc analyses found some evidence for heterogeneity of outcomes by ethnicity within the Ethnically Diverse group, but given the small size of the subgroups in question, further work is needed on this topic.

Although data on country of birth were not available, it may be that a healthy migrant effect [44] accounts for some of the Ethnically Diverse group’s health advantage. This contrasts with the English, the other group where one might speculate that a substantial proportion is migrants to Wales, who had much poorer health. Health selection effects by migration, previously identified elsewhere in Wales [45], likely depend on the reasons for migration, which, in turn, likely vary geographically. The Ethnically Diverse group is concentrated in Cardiff, a major city, while the English are concentrated along the north coast, a popular retirement location. Further work is needed to characterise this apparent geographic heterogeneity in health selection effects.


The analyses have a number of limitations and caveats that are worth highlighting. Firstly, the use of latent class analysis could be questioned—Why not simply model the health outcomes of those identifying with different nationalities? Such an approach would have had important limitations. Firstly, it would have conflated the Anglophone Welsh and Cymry Cymraeg groups, both ignoring the previous work showing these groups to be meaningfully distinct [9, 10] and missing the health inequalities between these groups. Secondly, this would not have integrated the information on geography into the model, which previous work has shown is important for Welsh identity. The use of latent class analysis, conversely, allowed the model to be informed by previous work, while still allowing it to deviate where this was a poor fit for the data—for example the broader conception of the Anglophone Welsh group than Balsom’s Welsh Wales construct. That being said, it is important to reiterate that the five-class model is used as a useful heuristic, rather than being proposed as the definitive model of Welsh identity. As a case in point, the six- and seven-class models included further plausible classes—the Welsh-speaking British and a division of the Ethnically Diverse group into a substantially British-identifying group of people of colour and a group who did not identify with any of the provided national identities, with predominantly ‘White other’ ethnicities.

One could also question the choice of variables that were used in the latent class analysis. A notable exception was class. Here, the decision was made to use proxies for class in the second stage of analysis to assess the extent to which group differences could be explained by class, and for it to feature at both stages of analysis, it seemed to have the potential for circularity—defining groups (partly) by class and then seeing whether the differences between them could be explained by class. Another potentially contentious decision was using a variable measuring respondents’ use of the Welsh language, rather than just the ability to speak it, as mentioned above.

Another possible criticism is that the Ethnically Diverse group represented a ‘none of the above’ category, rather than a coherent group. It is certainly clear that this group does not represent a single national identity. That being said, this approach can be justified on a number of grounds. Firstly, there were no good alternatives. Excluding this group or attempting to merge it into one of the other groups would have been both clearly unsatisfactory and ad hoc. Furthermore, a limited number of response options were available for the questions concerning national identity, so it would not have been possible to make some of these distinctions using the available data anyway. Some exploratory post hoc analyses were run within the group between different ethnic identities, but these were underpowered due to the limited sample size. More positively, the comparisons between the Ethnically Diverse group and the others reveal some striking and interesting health disparities. Furthermore, although this group was clearly heterogenous in terms of ethnicity, and probably in terms of national identity, there were also commonalities, namely youth, high education and urbanicity.

Finally, it should be emphasised that this is a piece of descriptive epidemiology, and any speculation about the causal mechanisms that underlie these health disparities is speculative. Identifying causal effects in a context like this is challenging, but possible [25, 46], and will hopefully be the subject of future work.


National identity has been a surprising absentee from social epidemiology research—Identities, ethnic and civic, have clear relationships to health. This work confirms this empirically, finding wide disparities among five national identity groups in Wales. Wales is an ideal venue for this research, as part of a multi-national state where national identity is ambiguous and negotiated, but is likely representative of many other nations in this regard. National identity remains a powerful social force in the twenty-first century, and health is part of that story.