The first stage of our analysis was to assemble a database of census variables that were collected in the censuses of both 2001 and 2011. Sixty attributes that were used to build OAC 2011 were selected initially, and then refined to a smaller subset of 55 variables that were also available in 2001, and with a consistent set of definitions (Table 1). To aid this process, the analysis was also restricted to England, given the different and changing remits of the census between UK countries and time periods. As such, our specification of variables was essentially guided by issues of data availability, but we believe that the small deviation from the 2001 and 2011 OAC specifications maintains the general-purpose nature of the hybrid analysis that follows.
A largely common set of output area zones was used, each containing an average of 300 people and 130 households in 2011, and are the smallest zonal geography for which comprehensive census attributes are released in the UK. The vast majority of the 2011 zones are the same as those used in the 2001 census outputs, with approximately 2.6 % of zones formed from either merging or splitting to reflect underlying population changes during the intercensal period.Footnote 2 Local changes to census geography are, of course, an indicator of likely change in social, economic and demographic circumstances, and so changes in local census administrative geography are itself an indicator of change. In the analysis that follows, we supplement the evidence from changing administrative geography with evidence of changing geodemographic characteristics of census areas throughout the study area. The target zonal geography used for this analysis was that of the 2011 output areas, identified using a lookup table available from the Office for National Statistics.Footnote 3 The process of reconciling 2001 data with the complete set of 2011 boundaries entailed summation of constituent zones that had been merged in 2011, and apportioning 2001 zone totals proportionately to area for 2001 zones that had been divided for purposes of the 2011 census.
The 2001 and 2011 census input data were thus rendered compatible with the 2011 output area geography, resulting in two records for each area: one for 2001 and one for 2011. For each area, inputs were calculated as percentages, with the exception of population density and standardised limiting long-term illness. As with the creation of the 2011 OAC, the data were then transformed using an inverse hyperbolic sine transformation (Johnson 1949), in order to return inputs that were more normally distributed, with the aim of aiding cluster identification using k-means. Prior to clustering, the data required standardisation onto the same scale and so, in common with the 2011 OAC, variables were range-standardised onto a 0–1 scale. The final input data set comprised 55 variables and 342,744 output area records, equal to twice the number of 2011 output areas within England.
The input data were assembled in the statistical programming language R (R Core Team 2013), which was then used to run the k-means algorithm 10,000 times in order to identify an optimal and robust partitioning of the areas into an 8-cluster solution. Repeated runs are necessary as the algorithm outputs are sensitive to the initial seeding of the k clusters. Eight classes were identified as a parsimonious solution that also matched the 2011 OAC.
An alternative method might have been to cluster two separate classifications for 2001 and 2011, akin to developing a standard census geodemographic system for each. However, such classifications would be optimised against a different distribution of input values, and the aim here was to establish linked clusters drawn from the same input data and then to use these to examine how the relationship between areas and the cluster means changed over time. This was made possible because a common set of attributes were available for both 2001 and 2011. With this method, the clusters might be conceptualised as an optimised assignment of areas into groups derived from an average of the two time periods; i.e. the clusters represent the best fit for the whole time period, rather than on the census nights of either 2001 or 2011.
Once the common classification was created, this could be split post-clustering and mapped for the two time periods. For the purpose of this analysis, we only created a single tier (‘Super Group’) classification in order to map those main cluster changes between 2001 and 2011.
In common with the 2001 and 2011 OACs, we also created short descriptions of the clusters, and assigned representative labels to aid end-user interpretation of the main cluster characteristics. There are multiple ways in which this process can be accomplished, and our preferred technique was to calculate a “grand index”, representing the deviation of the classification input attributes within each cluster away from their national representation in the pooled data sets for 2001 and 2011. Such scores are typically standardised so that 100 would represent the national average over the entire data set, 200 a rate of double and 50 a half. Options for creating index scores included separating out the 2001 and 2011 clusters and input attributes, and calculating separately; or, combining both years together, and calculating index scores on the basis of both 2001 and 2011 inputs combined. The latter method was selected because it was the combined data that were used to form the clusters, thus also maintaining the unified approach for cluster description. The cluster index scores for the selected variables are presented in Table 2.
Cluster labels and descriptions are as follows:
Cluster 1—suburban diversity These areas are typically suburban in location, with very high ethnic diversity. Populations are typically young, and many families have dependent children. There are above average numbers of residents from newer EU countries, and crowded, privately rented terraced housing is common. Perhaps given lower rent values within these areas, they are also attractive to students. Although unemployment is higher than average, those who are in work tend to be employed in manual occupations such as warehousing, transport, accommodation and food services.
Cluster 2—ethnicity central These are areas of very high ethnic diversity, with especially high prevalence of Black and Bangladeshi residents. Many households have young children, and rates of divorce are higher than the national average. There are also high numbers of students living within these areas. The dominant housing stock is flats, with many overcrowded and rented from the public sector. Unemployment within these areas is typically high, and as might be expected given their central locations, public transport is heavily used.
Cluster 3—intermediate areas These areas have few distinctive features, apart from higher than average numbers of very elderly people living in communal establishments.
Cluster 4—students and aspiring professionals Undergraduate and postgraduate students, as well as those who are starting their careers, are over-represented within these areas. Residents are ethnically diverse, with higher than average numbers of people identifying their origins as Chinese, Indian or being born in countries that acceded to the EU prior to 2001. The dominant housing stock is flats, which are typically rented within the private sector, and there is some overcrowding.
Cluster 5—county living and retirement These rural areas are overwhelmingly White and house large numbers of people who work in agriculture, forestry and fishing. Of those not working, there are higher numbers of people who are past retirement age. Many people live in uncrowded detached houses, perhaps because children have aged and left the family home.
Cluster 6—blue-collar suburbanites These suburban areas are dominated by terraced or semi-detached housing, with a higher than average number being socially rented. Employment is most typically in manufacturing, although many other blue-collar occupations are prevalent, such as construction.
Cluster 7—professional prosperity The populations of these areas are most typically White and towards the latter stages of successful careers in a range of white-collar professional occupations. Most are married, and if they have had children, these are of an age where they are no longer dependent. Housing within these areas is typically privately owned and detached; higher incomes enable many households to sustain multiple car ownership.
Cluster 8—hard-up households These deprived and predominantly White areas feature households from a full range of age groups. Those of working age experience higher than average rates of unemployment. Employed residents work in service or manual occupations. Housing within these areas is typically terraced or flats, with some overcrowding and very high rates of renting within the social housing sector.
These descriptions are exemplified in Fig. 1, which shows the changing complexion of three English cities: Bristol in the South West of England and Liverpool and Leeds in the north. Both Liverpool and Bristol are predominantly urban areas (cluster 5, “county living and retirement” has either no or limited representation), whereas Leeds represents a much larger local authority district, complementing its urban core with more extensive hinterland and rural areas. There is no radical change in the assignments over the 2001–2011 period, in large part because the systems of property ownership and planning control preclude this. That said, Liverpool in particular has seen significant redevelopment and housing clearance during the last intercensal period (Sykes et al. 2013), and there is evidence of this within the core areas radiating from the Central Ward.Footnote 4