Introduction

Frailty is a state of vulnerability to minor stressors such as infections or falls, which leads to a disproportionate decline in overall health status and increased risk of disability and death [1, 2]. Frailty can be identified and classified by severity using either a phenotype model following a physical assessment [3], or by using a frailty score or ‘index’ based on an accumulation of conditions or deficits, which can be derived from routinely collected data or other longitudinal surveys of health status [4]. Pooled analysis of European data estimates the prevalence of frailty in older adults at 18% in all settings, and 12% in community-based studies [5]. Frailty is more common as age increases but is not synonymous with ageing; evidence suggests increased variability in frailty with ageing, but age only partly explains frailty trajectories [6, 7]. Factors including sociodemographics, specific long-term conditions, physical activity and education have been associated with frailty progression [8]. It has been estimated that a doubling in deficits (and therefore the frailty score) occurs over 12.6 years, although the small cohort size suggests further confirmation is needed [9]. There is still uncertainty as to the relationship between the rate of change of deficit accumulation and death [10, 11]. Current evidence is predominantly cross-sectional and there is contradictory evidence from studies with different methods of assessing frailty and varying analytical approaches, necessitating standardisation of study methods and greater use of longitudinal cohorts [8]. In addition, little is known about the onset of frailty and its development in middle-aged and younger old adults and the impact of frailty transitions in these age groups within typical primary care populations, making population health and social care planning across this part of the life-course difficult, given that frailty is already prevalent in people by the age of 65 [9].

Although frailty research has expanded over the last decade, use of different measures and concepts of frailty, small cohort sizes that are not representative of the wider older population, and variable lengths of follow-up have led to uncertain conclusions on the occurrence and progression of frailty within ageing populations [12], particularly in primary care. There is therefore a need for larger, longer cohort representative studies which include standardised methods of establishing frailty, with key covariates related to frailty risk and risk of other adverse outcomes and clear descriptions of populations and methods. An electronic Frailty Index (eFI) has been developed and validated using routine primary care data and has demonstrated robust predictive validity for outcomes of mortality, hospitalisation and nursing home admission, outcomes associated with the vulnerabilities attributed to frailty, and has the ability to detect longitudinal changes at an individual level [13, 14]. The use of the eFI as a risk stratification tool is expanding and has become standard throughout primary care in England, making the eFI an appropriate measure for epidemiological studies of frailty in primary care at population level. As the eFI score can be calculated from routine primary care records, it is possible to apply the tool to the study of large datasets to generate widely understood and transferable findings [15, 16].

The cohort described in this paper is part of a larger project which aims to describe the epidemiology and dynamics of frailty in the adult primary care population (aged ≥50 years) and its impact on patients and health and social care services and costs [17]. These analyses will inform the development of a System Dynamics simulation model to predict future trends in frailty and related service demand for the purpose of developing guidelines and tools to facilitate commissioning and service development [18].

This paper gives an overview of the cohort dataset and key variables, and presents baseline descriptive data both for the general practices from which patient records originated and for the participants.

Methods

Study design

Retrospective cohort using routinely collected electronic health records.

Study setting

The UK has a registration-based primary healthcare system where patients are registered with a single general practice. Patients are allocated a personal lifetime unique identifier, (a National Health Service (NHS) number), which reduces the risk of duplicate records and facilitates the linkage of primary care to other healthcare datasets. The Oxford-Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) is a network of around 5% of primary care practices in England which contribute electronic health record data voluntarily. Practices registered with the RCGP RSC have been shown to be nationally representative in terms of the population served and health outcomes [19, 20].

Eligibility criteria

The inclusion criteria were patients (i) aged ≥50 years, (ii) registered at a general practice contributing to the Oxford-RCGP RSC network database and (iii) registered at any time between 2006 to 2017. Potential duplicate patients were excluded, i.e. more than one sex present for a patient record, duplicated calendar years of data and differing birthdates in the patient record, and patients with missing or impossible birthdates. Yearly records were excluded as follows: (i) where a patient changed practice within a calendar year and had duplicate yearly records, the yearly record with the longest period was kept if continuous years of data were available; (ii) person-years of data following a gap of 1 year or more in the observation record, even if the patient re-registered with an RCGP RSC practice.

Data from all participants constitutes the open cohort – i.e. a cohort which included entry of new registrants such as patients turning 50 and those entering RCGP RSC practices from other areas. A closed cohort was also defined, including only eligible participants present in the cohort index year of 2006. These two cohorts will be used for different analyses during the project and allow for exploration of cohort ageing and the impact of frailty on the overall population on service use.

Data sources

Electronic health records (EHR) were extracted from the RCGP RSC dataset. Publicly available datasets were linked to the primary care data, including the Income Deprivation Affecting Older People Index (IDAOPI) 2015 [21], geographical information from the geography portal of the Office for National Statistics (ONS) [22] and workforce data for GP practices [23].

Measures

Electronic frailty index

The electronic frailty index (eFI) was used to identify and grade the severity of frailty [14]. The eFI includes 36 deficits across disease states, symptoms/signs, abnormal laboratory values and indicators of disability which were identified according to standard methods for creating a frailty index [24]. An eFI score is derived as the number of deficits present as an equally weighted proportion of the total possible [14]. Using the same Read codes (Clinical Terms Version 3 - CTV3) as in the original derivation of the score, variables for each deficit were created and flagged as ‘present’ if the Read codes were present in the patient EHR at any point in their prior medical history on the 1st January and 1st July for each calendar year for each participant. As this method retrieves codes from the patient’s complete medical record, there is no missing data for any of the deficits. An eFI score (total deficits at each 6-monthly cut-point/36) was then generated, and categorised into frailty states indicating increasing severity and risk of poor outcomes as: fit (0–0.12), mild (0.13–0.24), moderate (0.25–0.36) or severe (> 0.36) [14]. As the eFI is designed to be a cumulative index, reversals (for example data artefacts due to change in GP practice during follow-up) were imputed to the previous frailty state, apart from the polypharmacy deficit, which is calculated from prescription information from the previous 12 months.

Sociodemographic variables

Sociodemographic variables include age, sex, Indices of Multiple Deprivation (IMD) quintiles and IDAOPI quintiles. The IMD is a small-area measure of socioeconomic status based on postcode, ranked nationally, which includes seven domains: income, employment, education/skills/training, health and disability, crime, barriers to housing and services, and living environment [25]. The IDAOPI is a subset of the Income Deprivation Domain, and focusses specifically on the percentage of the population aged 60 and over who receive income support, income based job seekers allowance, pension credit or child tax credit claimants aged 60 and over and their partners aged ≥60. The 2015 deprivation indices were related to the last known patient address in the dataset or, where missing, were imputed using the IMD or IDAOPI indices related to the GP practice address (3.6% of patients). Ethnicity data was maximised using a customised ontology and coded into categories (Asian, Black, White, Mixed/other) [26]. The most recent ethnicity reported in the patient record was used as the baseline ethnicity value to reduce missing values in the year of entry to the cohort. Age was categorised into four groups, reflecting groupings reported in literature relating to older adults’ healthcare, and cut-offs for services reported by the study Stakeholder Engagement Group (SEG): 50–64, 65–74, 75–84 and ≥ 85. Receipt of residential care during the cohort period for each patient was coded ‘yes’ or ‘no’ by using a combination of Read codes [13] and use of a household key (11 or more patients at the same address with a median age of 50 or above) for the patient’s last known address at the date of data extraction (May 2019).

Clinical variables

Individual eFI deficits, associated specific common long-term condition codes (e.g. COPD, asthma, rheumatoid arthritis) not included in the eFI and those present in the Quality Outcomes Framework (QOF) (e.g. dementia, depression, cancer, obesity) with dates of onset, were generated on a yearly basis for each patient ascertained from the whole patient medical history. Other annual data included smoking status, influenza and pneumococcal vaccinations. All body mass index (BMI) measurements present in the patient record were provided. Due to the differences in availability of values, a baseline BMI value was defined as the first recording in a patient’s cohort entry year, or, where missing, the first value in the nearest previous year to cohort entry (up to a maximum of 2 years) or the nearest year afterwards (up to 2 years).

General practice variables

General practice information included the geographical region, urban/rural indicators based on the 2011 rural/urban classification (RUC11) [27], IMD and IDAOPI for the practice postcode, number of patients registered in the practice, and total practice consultations per year. The total general practitioner (GP), nurse and overall practice staff full-time equivalent (FTE) for each general practice in 2013 (the first year this information is available to be linked on practice code) was included [28]. Each calendar year of participant data was linked to a general practice identifier and dates of the participant registering and leaving the RCGP RSC practice were provided.

Death

The month and year of death were provided. Primary care death data in the RCGP dataset has been shown to be accurate for this calendar period [29, 30].

Service use

Primary care service use outcomes include number of days in a calendar year with a consultation by consultation type (administrative, face-to-face, telephone, electronic consultation, or home visit), total number of GP prescriptions and number of unique prescriptions by British National Formulary (BNF) chapter [31].

Statistical analysis

A description of the primary care data from the open cohort is presented in this paper. The characteristics of RCGP RSC practices with participants in the cohort are described for the calendar year 2006 (first year of cohort). Age category distributions for both the open and closed cohorts were analysed and presented graphically for the calendar years 2006–2017. The reasons for exit from the cohort are summarised. Patient sociodemographic and clinical characteristics at the year of cohort entry (i.e. for the open cohort) are described according to the four age groups, and missing data quantified.

Results

The open cohort comprised 2,177,656 patients, contributing 15,552,946 person-years of data (Fig. 1).

Fig. 1
figure 1

Cohort definition

Practice characteristics

Four hundred and nineteen primary care practices distributed across England contributed to the cohort between 2006 and 2017 inclusive. Practices varied widely in their patient numbers and consequently their totals of yearly consultations (Table 1). Practices reflected population distributions throughout England, with 78% in urban areas and an even spread across IMD quintiles.

Table 1 Primary care practice characteristics in 2006 (n = 419)

Patient baseline characteristics

The sociodemographic and clinical baseline characteristics of participants in their year of entry to the cohort are presented in Tables 2 and 3. The mean age of participants was 61 years (SD 12). Demographic trends with increasing age were observed, including a higher proportion of female sex, lower ethnic diversity and rural residence in the older age groups. Ethnicity data was more likely to be missing with increasing age, decreasing deprivation, male sex, urban location and for people in residential care. Patterns of indices of deprivation appeared similar across age groups, with half the cohort located in the two least deprived quintiles. Long-term conditions were more prevalent in older age groups at baseline, with the exception of depression and obesity which were more common in younger age groups. The eFI score increased with age, as did the proportion of participants in the mild, moderate and severe frailty categories. The proportion of people with frailty at cohort entry increased from 10% in the 50–64 age group to 69% in people aged ≥85.

Table 2 Participant sociodemographic baseline characteristics
Table 3 Participant baseline clinical characteristics

Follow-up

Participant data was extracted for the 12-year period from 2006 to 2017, inclusive. There were 1,107,481 eligible patients in the first year of the cohort (2006), increasing to 1,491,954 at the beginning of 2017, with a total of 1,070,175 new participants joining the cohort between 2007 and 2017. Patients contributed a mean of 5 years of data, with 647,239 patients (58.4%) who were present in the first cohort year (2006) having the full 12 years of data. Patients present in 2006 comprised 50.9% of the cohort and contributed 67.0% of the total person-years.

Between 2006 and 2017, 137,481 patients died (6.3% of cohort) and 635,400 patients moved out of an RCGP RSC practice (29.2% of the cohort). The full details of entry and exit to the cohort by calendar year according to age groups and frailty category at cohort entry are given in Supplemental Tables 1 and 2. There was an inflow of new participants over the cohort period, across all age groups and frailty categories, which was more notable in younger age groups. The mean follow-up period increased from 4.8 years in people categorised as fit at cohort entry to 7.4 years in people categorised as severely frail (Supplemental Table 3). The age distribution over the cohort period for the closed cohort (participants who were present in 2006 onwards, showing attrition due to death and leaving RCGP RSC practices) and the open cohort (participants present in 2006 plus those moving into an RCGP RSC practice and people turning 50) is given in Fig. 2.

Fig. 2
figure 2

Age group distribution over cohort period

Discussion

There is an urgent need for a better understanding of current and future care requirements for people with frailty, including those in middle age to the younger old, where there is likely to be unmet need. This cohort of approximately 2.1 million adults aged 50 and over with long-term follow-up data on frailty status using the electronic Frailty Index (eFI) applied to routinely collected primary care data is the largest cohort so far that will be used for longitudinal analysis to explore frailty dynamics and its impact using linked hospital and mortality data. The dataset includes adults aged 50 and over from around 5% of general practices from a single country (England), thus meeting the project aims to provide a whole-system analysis of representative population-level data. The general practices from which the cohort is derived vary in their characteristics, with a range of geographical locations, urban/rural mix, practice sizes and areas of differing deprivation, further demonstrating the representativeness and generalisability of the RCGP RSC dataset [19]. This diversity will reflect a variety of care settings and approaches to managing people with frailty, so that the subsequent simulation models can reflect population and care heterogeneity.

Characteristics of patients at cohort entry show expected patterns in sociodemographic variables and trends in clinical conditions and lifestyle factors, further demonstrating representativeness of the data and suitability for planned analyses. The observed trends in increasing eFI with age group and the greater proportions of moderate and severe frailty categories in older age groups reflect current knowledge, and the observed presence of frailty in 10% of adults aged 50–64 in our cohort highlights the importance of examining frailty and its trajectories earlier in life. The study design of an open cohort allows entry of younger patients throughout its duration, representing a dynamically ageing population in which the overall mean age increases over the cohort period. The cohort therefore includes substantial data on middle-aged adults which is novel, and also crucial for observing potential earlier manifestations of frailty and its progression over time, as it is likely that future interventions to reduce incidence and progression to earlier stages of frailty may be targeted at this age group.

Study strengths and limitations

This dataset is derived from a single national health service (NHS), in which registered patients are managed in accordance with specific guidelines and broadly similar care pathways even though living in different regions of the country and under different primary healthcare practices. Primary care is free at the point of delivery, as is hospital care. GP registration coverage in England is high [32] therefore data from the primary care population are essentially representative of the overall population. However, the cohort does not include information from private healthcare, which is most commonly available via private medical insurance and accessed by around 11% of the UK population, although schemes have limited cover for general practice [33]. Information on adult social care, which is means tested, is also not available as it is organised via a mix of state, private and voluntary providers.

The large size of the dataset enables the stratification of the cohort by key characteristics whilst maintaining sufficiently sized subgroups to provide precise estimates to inform services for geographical areas with particular population characteristics. The dataset includes a wide range of covariates which have been identified in other studies as being associated with either frailty onset or outcomes following frailty occurrence. However, covariates such as social factors (e.g. loneliness, living situation) and contemporaneous information on residential care status is not available. To address this, information on the impact of such factors on outcomes will be sourced from the literature and included within ongoing analysis supporting the development of the simulation model. Large EHR datasets will necessarily reflect local differences in data entry and coding procedures, and accuracy of the eFI will also depend on timely attendance of patients to healthcare services to enable assessment and diagnosis, and on transfer of information between different healthcare services. The cumulative nature of the eFI and its validation in large routine healthcare datasets from different countries [34,35,36] demonstrate that the eFI is capable of identifying frailty as a vulnerability, and suggests variability in coding accuracy or completeness does not have a significant impact on the ability of the eFI to identify changes in frailty status in ageing populations.

Approximately one third of participants have no ethnicity data, which is self-reported administrative data more prone to be missing than clinical data. There appears to be under-reporting of non-white ethnicity, with participants identifying as ‘white’ comprise 92% of the given values, as compared to 86% in the 2011 Census (data for England and Wales) [37]. It is possible that the under-representation of people from ethnic minorities could also be due to recognised issues with lower primary healthcare usage, rather than practices not reflecting their catchment populations, or due to a reporting bias in the primary care data [38]. However, inclusion of ethnicity data from HES in future planned analyses should increase the proportion of the cohort with recorded ethnicity [39].

During the cohort period, there was significant movement of participants both into and out of the cohort, reflecting real-life population flows which are key parameters for simulation modelling of population health needs. Frailty data and patient and service use outcomes are being collected for each year that the patient is registered, thus providing full outcome ascertainment for each year of participation. The 65–74 group had the lowest number of exits, and higher numbers in other age groups could be a consequence of greater mobility in the working age population or moves related to higher levels of support in older age groups, for example following a health or social care crisis [40]. This could lead to an underestimation of incidence and progression of frailty in our cohort, although participants of older age and greater frailty severity also had the longest follow-up periods. Although the location of practices seemed equally distributed across the deprivation quintiles, there was a trend across all age groups for greater representation of patients from the least deprived quintiles, perhaps reflecting movement to wealthier retirement areas and the suburbs.

The 36 eFI deficits and other long-term conditions were defined using Read codes. For future data extractions, migration of clinical term definitions from Read codes to the Systemised Nomenclature of Medicine – Clinical Terms (SNOMED CT®) will be necessary to reflect national harmonisation of coding tools across the healthcare pathway [41].

Summary of planned analyses

This dataset will be linked with outcome data on secondary care service use from NHS Digital so that population-level parameters for the simulation model, for example the numbers of hospital admissions for people in different frailty states in each calendar year, can be calculated. The long follow-up period, averaging 5 years and with more than 600,000 participants over 12 years, is essential for being able to move beyond exploring impact of frailty states and associated outcomes at a defined timepoint. The time frame allows for many transitions between frailty states over the cohort period to be observed and gives ample scope for investigating frailty trajectories in the population to inform population-level planning for required services.

Epidemiological and statistical techniques will be employed to describe the incidence and prevalence of frailty over the cohort period and within population subgroups and rates of transitions between frailty states. Relationships between patient characteristics and frailty transitions and associations between frailty states and outcomes, including hospitalisations, mortality and service use, will be explored. Costs will be attributed to service use and predictions of future service use and costs under different scenarios will be produced using the simulation model.

Current evidence suggests that socioeconomic and educational deprivation is associated with higher frailty scores [42]. Analysis of this cohort will enable an in-depth analysis of the patterns of frailty onset and transitions according to social deprivation, which will inform decisions on whether this potentially high-risk population may benefit from targeted earlier intervention to improve outcomes.

The geographical range of the data will enable parameterisation of the simulation model with more localised data, permitting adaptation to reflect local situations and facilitating planning on whichever scale is the most appropriate. If there is interest in looking at the data for particular GP characteristics, for example focussing on larger practices or practices positioned in more deprived areas, the primary analyses can be re-run for the required subgroups to adjust parameters for the simulation model.

Conclusions

This cohort is the largest dataset compiling frailty transitions and outcomes and will provide unique information on frailty within middle-aged to young-old populations. Data presented here show that frailty is already common in middle-age and continues to increase across the later life course. The cohort will allow exploration of the impact of morbidities, socioeconomic and lifestyle factors on frailty onset, trajectories and outcomes over time. Strengths of this cohort are the use of large-scale routine primary care data with linkage to secondary care and healthcare costs, and a dynamically ageing population with lengthy follow-up, enabling novel insights into the onset and progression of frailty.