The incidence, prevalence, and survival of systemic sclerosis in the UK Clinical Practice Research Datalink

To estimate the incidence, prevalence, and survival of systemic sclerosis in the United Kingdom. We conducted a historical cohort study using data from the Clinical Practice Research Datalink (CPRD). We calculated the incidence and survival of systemic sclerosis between 1994 and 2013 and examined its association with age, sex, and socioeconomic status. We calculated point prevalence on 1 July 2013 and examined its association with the same exposures. We identified 1327 cases with incident systemic sclerosis. Annual incidence was 19.4 per million person-years between 1994 and 2013. The incidence was 4.7 times higher in women than in men, was not influenced by socioeconomic status, and has remained stable over the 20 year study period. The peak age of onset was 55–69 years. Survival at 1, 5, and 10 years was 94.2, 80.0, and 65.7%, respectively. The prevalence was 307 (290–323) per million with the highest prevalence in the 70–84 years age group. We estimate there are currently 1180 new cases of systemic sclerosis each year in the UK, and 19,390 people living with systemic sclerosis. Due to the predicted growth and aging of the population, we predict a 24% increase in incident cases and 26% increase in prevalent cases in 20 years’ time. Our estimates of incidence and prevalence are higher than previously reported in the UK, but similar to recent USA and Swedish studies, and do not support a north-south gradient of the occurrence of systemic sclerosis in Europe.


Introduction
Systemic sclerosis is a rare autoimmune disease of unknown etiology characterized by skin fibrosis and internal organ involvement. The incidence and prevalence of systemic sclerosis have been reported to vary widely with incidence estimated between 4 and 43/million person years [1][2][3], and prevalence between 88 and 443/million [4,5]. It seems the incidence and prevalence may be influenced by race [6,7], but whether there are true geographical differences in occurrence in Caucasian populations is less clear.
The literature before 2006 is summarized in a systematic review [8], which proposed a North-South gradient in Europe with lower rates in Northern European countries (UK, Finland, and Iceland [1,2,5,9]) compared to Southern European ones (France and Greece [7,10]). Studies published since this have continued to report high incidences in Southern Europe (Spain, Croatia and Italy) [3,11,12] but contradictory rates in Northern Europe with low annual incidence of 6-11 per million in Norway [13] and a higher rate 19/million person-years in Southern Sweden [14]. Incidence and prevalence in the USA, Canada, and Australia are reported at the higher end of this range [4,15,16].
Previous studies of the epidemiology of systemic sclerosis in the UK were small [1,5,17], and the epidemiology of systemic sclerosis has never been examined in a nationwide population-based study, which reduces the Significance and innovations • The incidence and prevalence of systemic sclerosis in the UK is higher than previous estimates • The incidence of systemic sclerosis is similar in the USA and Europe, and our findings do not support a European North-South gradient • Socio-economic status has no impact on incidence or mortality in systemic sclerosis biases that are inherent in smaller hospital-based studies. The availability of Clinical Practice Research Datalink (CPRD) (a longitudinal database of consultation based patient records from UK general practice) gives us an opportunity to examine the epidemiology of systemic sclerosis in a large population that is representative of the UK population [18].
Understanding the incidence and prevalence of systemic sclerosis will help to address the healthcare needs and aid service planning for this rare disease, both now and in the future. Such service planning may include the setting up, staffing, and resourcing of specialist treatment centers. It may also add to the debate about whether occurrence of systemic sclerosis is lower in Northern Europe.
The aim of this study was to estimate the incidence, prevalence, and mortality rates of systemic sclerosis in the UK and to explore temporal trends in its incidence. We also investigated the effects of age, sex, and socioeconomic status on incidence, prevalence, and mortality.

Study design and population
The UK healthcare system is well-suited to capturing diseases which are managed both in hospitals and in the community. In the UK, everybody is registered with a general practitioner (GP) who co-ordinates their healthcare including referrals to secondary care. For example, when a patient is discharged from hospital, or attends a hospital clinic, a letter is written to inform the GP of any new diagnoses, and these are added to the GP record. By using GP data, we would expect to capture people with the full spectrum of scleroderma, from the mildest to the most severe, and would not expect to need to interrogate hospital databases or discharge letters from hospital to identify additional cases.
This is a historical cohort study containing all 684 general practices contributing data to the CPRD in 2015. The CPRD is a longitudinal general practice database of approximately 13 million people who have contributed data since 1987, and approximately 6% of the UK population are currently contributing data [19]. The database contains general practices from all four countries in the UK and includes information on demographics, diagnoses, referrals, medications, and tests. It is deemed to be representative of the UK population [18,20]. We followed the CPRD's recommendations for selecting research quality patient records and periods of quality data recording by including people contributing Bacceptable^quality data in Bup to standard^practices. Our study was conducted between 1 January 1994 and 31 December 2013.

Case definition
We compiled lists of Read codes for a diagnosis of systemic sclerosis by searching the description fields of the Read code dictionary and excluding irrelevant codes, using a method described by Dave and Petersen [21]. Synonyms searched for were systemic sclerosis, scleroderma, and CREST. Only Read codes that were specific for a diagnosis of systemic sclerosis were used, and codes for localized scleroderma were excluded. Read codes are available as an online supplement and at clinicalcodes.org [22]. We did not validate the diagnosis of systemic sclerosis externally, because (1) validation of other chronic autoimmune diseases in the CPRD has shown positive predictive values of > 90% [23,24], and (2) GPs would be unlikely to give a patient a Read code for systemic sclerosis unless it had been confirmed by a hospital specialist [25]. Incident cases were defined as people with a first record of a Read code during the study period, and prevalent cases were people who had ever had a code for systemic sclerosis. We only included incident cases with a least 1 year of disease-free follow up in the CPRD prior to their diagnosis in order to reduce the chance of prevalent cases being misclassified as incident cases [26].

Data sources
All data were extracted from the CPRD files except for the 2010 English Index of Multiple Deprivation (IMD10) which was used as a proxy for socioeconomic status. IMD10 was supplied, via a linkage agreement, by the Office for National Statistics (ONS) and was available for patients in English practices that had consented to participate in the linkage scheme (~60%) [27].
We calculated crude incidence rates, and stratified these by age group, sex, year of diagnosis, and IMD-10 quintile. The denominator was all people contributing acceptable quality data in up to standard practices to the CPRD during the study period. Unadjusted incidence rate ratios were obtained by fitting variables individually in separate Poisson regression models. Mutually adjusted incidence rate ratios were obtained by fitting age group, sex, and IMD-10 quintiles as a priori confounders in a single Poisson regression model.
We calculated point prevalence per million people on 1 July 2013 and stratified this in the same way as for incidence. To calculate this, we divided the number of people with a Read code for systemic sclerosis who were alive and contributing data on this date by the total number of people alive and contributing data on this date. We used logistic regression models to calculate odds ratios (ORs) and 95% confidence intervals (CIs), which provides a good estimation of the relative risk for rare outcomes such as this. Unadjusted ORs were obtained by fitting variables individually in separate logistic regression models. Mutually adjusted ORs were obtained by fitting age group, sex, and IMD-10 quintiles as a priori confounders in a single logistic regression model.
Kaplan-Meier methods were used to estimate survival at 1, 5, and 10 years after diagnosis. We used Cox regression to estimate hazard ratios (HRs) for the effect of sex, age group, and IMD-10 quintile on mortality. Unadjusted HRs were obtained by fitting variables individually in separate Cox regression models. Mutually adjusted HRs were obtained by fitting age group, sex, and IMD-10 quintiles as a priori confounders in a single Cox regression model.
We used direct standardization of the age-specific incidence rates to the ONS age-stratified UK population now and projected UK population in 20 years, to estimate the number of expected incident and prevalent cases in the UK now and in 2037, assuming that incidence will not change over the next 20 years.
All analyses were performed using Stata 14 statistical software (Statacorp, Texas, USA).

Ethics
Independent Scientific Advisory Committee (ISAC) for MHRA Database Research approval was obtained for this study on October 26, 2016 (protocol 16_190R).

Reporting guidelines
This study has been reported following the RECORD guidelines for reporting of studies conducted using observational routine-collected health data [28].

Results
Overall, we identified 1327 cases of systemic sclerosis in the CPRD during the study period (Fig. 1). Of these, 83.2% were female and the mean age at diagnosis was 58.0 years (SD 16.4).

Temporal trend
The incidence of systemic sclerosis did not change significantly between 1994 and 2013 (Table 1), adjusted rate ratio 0.6% increase per year (P = 0.24).

Age, sex, socioeconomic status
The crude and adjusted incidence rate ratios are listed in Table  2. Incidence was higher among women than in men with an adjusted incidence rate ratio of 4.7 (95% CI 4.1-5.4), P < 0.0001. People aged 55-69 years had the highest incidence of systemic sclerosis. Socioeconomic status was not associated with incidence of systemic sclerosis, P trend 0.63.

Survival
Over the study period, there were 302 deaths during 6929 years at risk contributed by people with incident systemic sclerosis generating a mortality rate of 43.6 (95% CI 38.9-48.8) per thousand person-years (Table 4). Risk factors for increased mortality were male sex and increasing age. However, there was no association between socioeconomic deprivation and mortality. The 1-, 5-, and 10-year survival was 94.2, 80.0, and 65.7% respectively.

Estimated UK incidence and prevalence
We estimate that at present, there are 1180 new cases of systemic sclerosis each year in the UK and 19,390 people living with systemic sclerosis. Population projections show a large expected increase in the proportion of the UK population in the 55+ age group [29] in 20 years' time, meaning that we estimate in 2037, there will be 1460 new cases (24% increase) and 24,430 people living with systemic sclerosis (26% increase).

Main findings
Using the CPRD, we estimate that the annual incidence of systemic sclerosis in the UK population is 19.4 per million person-years. The incidence is nearly five times higher in women than in men, is not influenced by socioeconomic status, and has remained stable over the 20-year study period. The peak age of onset is 55-69 years. We estimate the UK prevalence of systemic sclerosis in the UK population is 307 (290-323) per million with the highest prevalence in the 70-84 years age group.

How our study fits in with other literature
Our study estimated incidence and prevalence in the UK to be higher than previously reported in Norway, Croatia, Greece, France, Taiwan, and Australia [6,7,10,12,13,16], similar to the USA, Sweden, and Spain [11,14,15], and lower than in Italy and Canada [3,4]. Our findings were consistent with other large population-based studies in Caucasians. A large study in the Detroit tri-county area of the USA between 1989 and 1991 estimated the incidence of systemic sclerosis to be 19.3 per million but with a lower prevalence of 242 per million, using 5 data sources including hospital records and a capture-recapture analysis [15]. A more recent study conducted in southern Sweden used a health register of 1.2 million pooled public and private health care records between 2006 and 2010 and estimated an incidence of 19 per million and a prevalence of 305 per million [14]. This study had approval to retrospectively review the medical records and validate the diagnosis of systemic sclerosis with reference to the 1980 ARA criteria [30], and the similarity between our findings support the reliability of our case ascertainment. Taken together [14], our studies challenge the idea that the prevalence of systemic sclerosis is lower in Europe than in the USA and Australia [8], or that it is lower in Northern Europe compared to Southern Europe [8].
Our estimates of incidence and prevalence are much higher than previous UK estimates. Our study is the first nationwide UK study of incidence and prevalence of systemic sclerosis, and the first of a prospectively collected healthcare database. Our estimate of incidence among adults is more than 4 times higher than the previous estimates of 4 per million from the west Midlands in the 1980s [1], and our estimate of incidence among children was 10 times higher than the previous estimate of 0.27/million across the whole UK in 2010 [31]. Our estimate of prevalence is 3.5 times higher than the most recent estimate of 8.4 per million from North East England in 2004 [5]. The lower incidence and prevalence reported previously are likely to be methodological, caused by under-estimation of cases in these studies due to their reliance on physicians (nationally or in a limited geographical area) being asked to record cases of systemic sclerosis attending their clinics and reporting this to the authors [1,31], or on physician recall [5] which are not as reliable as prospectively recorded diagnoses in the patient's general practice record.
The higher incidence and prevalence among women than men is well-recognized. Ours is the first study to report the effect of socioeconomic status on the incidence of systemic sclerosis; we observed no effect of socioeconomic status, as estimated by area-based deprivation data which is used as a proxy for socioeconomic status and social class. This lack of effect is in contrast to other autoimmune diseases such as rheumatoid arthritis and systemic lupus erythematosus where increasing deprivation is associated with increased incidence and disease severity [32][33][34]. Survival at 1, 5, and 10 years of 94.2, 80.0, and 65.7% respectively is very similar to the pooled estimates from an international meta-analysis of mortality in systemic sclerosis using individual patient data [35]. Mortality was increased in older people, and possibly men compared to women, but unaffected by socioeconomic status.
The mean age of diagnosis in our study was 58 years (SD 16.4 years). While this is consistent with a previous UK study [5] and a study using inpatient and outpatient healthcare databases in Northern Italy [3], it is higher than the established teaching that onset is greatest from early adulthood to the 4th decade [36] and higher than most recent studies which report the mean age of onset as 45-48 years [1,6,7]. It is unclear why this is. It could be that there are longer delays between symptoms onset and diagnosis in our healthcare system than in others. There is also a possible delay after the diagnosis and before it is recorded in the CPRD. Such delay has recently been quantified in rheumatoid arthritis in the CPRD and was found to be < 2 years [37].
Systemic sclerosis is an important condition for rheumatologists, because although rare, it has a very high mortality compared to other musculoskeletal diseases [38], and optimal patient care is challenging and involves multidisciplinary effort. We have found a much higher incidence and prevalence of systemic sclerosis in the UK than previously reported [38] and in addition have estimated that the burden on our healthcare services will increase by 25% over the next 20 years as a result of changes in population demographics.

Strengths
This study is the first nationwide European study and largest of its kind in Europe. The main strength of our study is the use of the CPRD dataset. It is the largest primary care database, containing more than 13 million patient records and covering approximately 6% of the UK population. It has been validated as a representative dataset so results can be applied to the UK population as a whole [20]. It has good quality demographic data, allowing us to study the influence of age, sex, and socioeconomic status on incidence, prevalence, and survival of systemic sclerosis. It has allowed us to conduct a population-based study in a prospectively collected dataset, which avoids the selection bias of cohort studies from tertiary referral centers and recall bias of studies relying on physician memory.

Limitations
Systemic sclerosis is a rare disease, and the number of cases identified is small in comparison with some other conditions. Despite this, we have identified a cohort of more than 1300 people with systemic sclerosis which allows us to make the most precise estimates of incidence, prevalence, and survival  Crude odds ratio is calculated using univariable logistic regression, and adjusted odds ratio is calculated using multi-variable logistic regression including sex, age group, and IMD-quintile as a priori confounders a From multi-variable (adjusted) analysis using the likelihood ratio test Crude incidence rate is calculated using univariable Poisson regression, and the adjusted rate ratio is calculated using multi-variable Poisson regression including sex, age-group, year of diagnosis and IMD-quintile as a priori confounders a From multi-variable (adjusted) analysis using the likelihood ratio test published in a European population, which are essential for health service planning. Within the CPRD, it is not possible to externally validate the date of diagnosis and accuracy of diagnosis of systemic sclerosis. In the past, it was possible to request anonymized sets of hospital correspondence, but this service has been withdrawn by the CPRD to increase the confidentiality of the database. Previous studies of the accuracy of the recording of diagnosis of similar chronic diseases in the CPRD have shown positive predictive values (PPVs) > 90% [39]. For example, the PPV of codes for granulomatosis with polyangiitis (formerly known as Wegener's granulomatosis) and idiopathic thrombocytopenic purpura were both found to be 91% [23,24]. There is no reason to believe it should be dissimilar in systemic sclerosis because a diagnosis of all of these serious autoimmune diseases are very unlikely to be recorded by a GP without confirmation from secondary care, where the diagnosis would have been made [25,40]. It is therefore also not possible to apply classification criteria, but our study represents people considered to have systemic sclerosis by their physicians. It is also unknown how many people may be undiagnosed, which means that our estimates, like all others, are likely to be underestimates of incidence and prevalence.
The Read codes do not allow cases to be differentiated into diffuse cutaneous and limited cutaneous phenotypes of systemic disease; however, we have excluded morphoea and other localized forms of cutaneous only disease. Our study therefore contains people with systemic sclerosis but cannot comment on differences between diffuse and limited cutaneous phenotypes.

Conclusion
We found the UK incidence and prevalence of systemic sclerosis to be higher than previously reported in the UK but similar to other recent USA and European estimates [14,15]. Our findings suggest that systemic sclerosis is not less common in Europe than in the USA and Australia and not less common in northern Europe compared to southern Europe.   Crude hazard ratio is calculated using univariable Cox regression, and adjusted hazard ratio is calculated using multi-variable Cox regression including sex, age group, and IMD-quintile as a priori confounders a From multi-variable (adjusted) analysis, using the likelihood ratio test b Quintile 1 is the most deprived, Quintile 5 is the least deprived distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.