Background

The Swedish nationwide population-based registers are widely used in epidemiological studies, providing a less resource demanding data source than patient records to establish representative scientific results [1,2,3,4]. Nonetheless, administrative registers may be subject to diagnostic and coding errors [5]. Individual-level linkage across the different registers can be used to corroborate the information contained within a single register. The National Patient Register (NPR), a principal data source for research [4, 5], contains records of healthcare visits in Sweden with diagnoses registered by the International Classification of Diseases (ICD) codes. The validity of the complete NPR is high [5], but knowledge of data quality is lacking for some specific diseases, such as multiple sclerosis (MS).

MS is a neurodegenerative disease that typically affects younger adults, causing increasing levels of cognitive and physical impairment as the disease progresses [6,7,8]. Sweden has an especially high prevalence [7]; with an estimated all-age nationwide MS prevalence of 189 per 100,000 [9]. Despite that MS diagnoses are now being set earlier and more accurately, no single clinical feature or diagnostic test can positively diagnose MS [10,11,12]. This clinical context may reduce our certainty of the recorded MS diagnoses due to the risk of misdiagnosis between MS and other degenerative neurological diseases, a risk which has been suggested to still exist with the 2017 update of the McDonald criteria [10, 13]. Therefore, it has been questioned to what extent an MS diagnosis is incorrectly assigned and recorded in administrative records during the period investigating whether a patient has MS [10, 13,14,15,16,17,18].

Besides the NPR, several other Swedish nationwide registers contain MS-specific information (i.e., MS ICD codes or medications), which could be used to identify MS cases. Accordingly, we aimed to assess the validity of MS diagnoses recorded in the NPR with a novel method using information linked from other nationwide registers in two sequential register-based case-definition algorithms.

Methods

This validation study evaluates the NPR as a relevant source for MS studies by comparing the records of all individuals aged 16–64 with an MS code during 1 January 2001–31 December 2013 in the NPR against other register-based MS records, through two sequential register-based case-definition algorithms as the ‘gold-standard’ reference.

Data sources

The unique personal identity number assigned to all residents in Sweden enabled individual-level linkages across six nationwide registers [19], administered by the following four authorities:

  • National Board of Health and Welfare: The National Patient Register (NPR) includes information about healthcare visits, including main and all side diagnoses coded according to the Swedish version of the ICD at time of visit [20]. The NPR has had nationwide coverage since 1987 of in-patient admissions and of out-patient specialist visits since 2001 [5]. The Swedish Prescribed Drug Register (SPDR) contains nationwide records of dispensed medication prescriptions with Anatomical Therapeutic Classification (ATC) codes since 1 July 2005 [21]. The Cause of Death Register (CDR) contains nationwide information on the date, underlying and contributory causes of death recorded by ICD codes since 1961 [22].

  • Karolinska University Hospital: The Swedish MS Register (SMSReg) is a nationwide but voluntary MS-specific clinical register that is used for nationwide pharmacological surveillance and for enhancing quality of care [23, 24]. SMSReg contains comprehensive clinical information [25] for included patients diagnosed with MS to support clinical decision-making, including retrospective information predating the SMSReg’s establishment in 2001 [26]. The accuracy and completeness of the clinical data has been recently estimated to be of value for future studies [25].

  • Social Insurance Agency: The Micro-Data for Analysis of the Social Insurance System (MiDAS) contains information on sickness absence (since 2005) and disability pension benefits (since 1994) (dates and diagnoses (by ICD codes)) [27].

  • Statistics Sweden: The Longitudinal Integration Database for Health Insurance and Labour Market Studies (LISA), comprises annual socio-demographic information on all people registered as living in Sweden [28, 29].

Study population

All individuals with a main or side diagnosis of MS (ICD-10: G35) recorded when having a healthcare visit in 2001–2013, aged 16–64 years, were identified in the NPR (n = 19,781). From the date of first MS diagnosis in 2001–2013 in the NPR, both retrospective and prospective information was obtained from the above mentioned six registers, in order to confirm the MS diagnosis. If it was not possible to confirm the MS diagnosis, the individual was followed until emigration, death, or end of data extraction of each register (varies between 2013 and 2016), whichever came first (Fig. 1).

Fig. 1
figure 1

Data coverage and availability from the nationwide register sources in relation to the study period. Notes The study period refers to dates to identify MS cases from the Swedish National Patient Register (NPR) among people aged 16–64 at the time of the visit. Six nationwide registers containing individual-level data were used. The NPR (red arrows) contains information healthcare visits according to ICD codes, but with different coverage dates with regards to the healthcare setting. MS healthcare visits in both the in-patient (nationwide since 1987) and specialised out-patient (included 2001) healthcare settings until 31 December 2013 were included. The Swedish Prescribed Drug Register (SPDR) (orange arrow) contains information of the nationwide records of dispensed medication prescriptions with Anatomical Therapeutic Classification (ATC) codes since 1 July 2005 and were available until 31 December 2013. The Cause of Death Register (CDR) (aqua arrow) contains nationwide information since 1961 on the date and underlying cause of death, including contributory causes, according to ICD codes. ICD codes were not available after 2016. The nationwide voluntary clinical quality register, Swedish MS Register (SMSReg) (green arrow), was established in 2001 and contains comprehensive clinical information of MS-related care for the included patients, including retrospective information predating the SMSReg’s creation from selected neurology clinics and was available until September 2014. Micro-Data for Analysis of the Social Insurance System (MiDAS) (blue arrows) has coverage of diagnoses for disability pension (DP) benefits from 1994 and diagnoses for sickness absence (SA) benefits from 2005, and was available until 2014. Longitudinal Integration Database for Health Insurance and Labour Market Studies (LISA) (purple arrow) contains annual individual-level information about the socio-demographics of the total population registered as resident in Sweden, as of 31 December, and was used for the years 2000–2013. ATC Anatomical Therapeutic Classification, CDR Cause of Death Register, DP disability pension, ICD International Classification of Diseases, LISA Longitudinal Integration Database for Health Insurance and Labour Market Studies, MiDAS Micro-Data for Analysis of the Social Insurance System, MS multiple sclerosis, NPR National Patient Register, SA sickness absence, SPDR Swedish Prescribed Drug Register, SMSReg Swedish MS Register

Statistical analyses

We first described the MS patients in terms of frequencies and percentages by: sex (women/men); healthcare setting (in-/out-patient); whether MS was the main diagnosis for the first record with an MS diagnosis during 2001–2013 (yes/no); number of visits with an MS diagnosis recorded in the NPR ever (< 3/≥ 3); whether in the NPR prior to 2001 with an MS diagnosis (yes/no); whether died before 31 December 2013 (yes/no); and age at first record with MS in the NPR during 2001–2013.

The annual MS prevalence (total and sex-specific) were estimated from 2001 to 2013 of all individuals aged 16–64 at the time of MS diagnosis according to the NPR. The prevalence estimates were expressed per 100,000 of the total population in Sweden aged 16–64, identified in LISA for the respective year.

Validation

Two sets (primary and secondary) of register-based case-definition algorithms were constructed by the multidisciplinary research group for this study to validate MS diagnoses recorded in the NPR with the extensive register data available. The algorithms cross-checked the MS diagnoses at an individual-level with specific information on MS from all available nationwide register sources in Sweden, including prior to and after 2001–2013 (Fig. 1). These algorithms, applied in a sequential manner, were thought to be equivalent to a diagnostic test to confirm the registered diagnoses [30]. Accordingly, the MS diagnoses recorded in the NPR were treated as ‘provisional’ MS until validated as ‘confirmed’ MS as per the algorithms.

Figure 2 displays the primary and secondary sets of case-definition algorithms, including the order of the steps identifying confirmed MS and the ‘gold-standard’ data source used [1]. The primary algorithm identified MS-specific codes (ICD-10 G35, ICD-9 340, or specific ATC codes for MS medications) in other registers in six steps. In the first step of this algorithm, individuals with clinical MS information, entered by their treating neurologist, in the SMSReg were identified. The secondary case-definition was then applied to the remaining provisional MS diagnoses after completing the primary algorithm. This exploratory algorithm matched to other MS-related information (MS-like symptom ICD codes), and sought to identify individuals who were unable to be sufficiently followed in the data to confirm the diagnosis as ‘plausible’ MS. The individuals who remained at the end of the last step in each set of case-definition algorithms were classified as ‘uncertain’ MS (i.e., patients who received at least one MS diagnosis recorded in the NPR, but without other register-based information corroborating the diagnosis). The confirmed MS are those identified according to both case-definition algorithms, i.e., had other register-based MS information supporting their MS diagnosis in the NPR. We then plotted the proportions of confirmed, plausible, and uncertain MS by the year of first MS record in the NPR.

Fig. 2
figure 2

Primary and secondary case-definition algorithms which together form the ‘gold standard’ to identify the confirmed MS diagnoses registered in the Swedish National Patient Register (NPR) by validating against individual-level information from other register data sources. Notes Given that patients with clinically isolated syndrome (CIS) should be kept as uncertain MS, and may be treated with a subset of DMTs, the following DMTs and ATC codes were not included in the algorithm: Interferon beta 1-a: L03AB07, Interferon beta 1-b: L03AB08, and Glatiramercetat: L03AX13. ATC Anatomical Therapeutic Classification, CDR Cause of Death Register, CIS clinically isolated syndrome, DMT disease modifying therapies, ICD International Classification of Disease, LISA Longitudinal Integration Database for Health Insurance and Labour Market Studies, MiDAS Micro-Data for Analysis of the Social Insurance System, MS multiple sclerosis, NPR National Patient Register, SPDR Swedish Prescribed Drug Register, SMSReg Swedish MS Register

Finally, in order to estimate the influence of the uncertain MS cases in the NPR if included in study populations, we profiled the descriptive characteristics of those remaining as uncertain MS after each set of algorithms and those confirmed MS (including plausible MS diagnoses). Similarity of the characteristics of uncertain MS after the secondary algorithm and those of confirmed MS were tested using Chi square tests. The mean number of years between the first and second MS record in the NPR with 95% confidence intervals (CI) were calculated. All analyses were conducted in SAS v.9.4.

Compliance with ethical standards

This project was approved by the Regional Ethical Review Board of Stockholm in accordance with the Declaration of Helsinki.

Results

Overall, 19,781 individuals (69.5% women) with MS diagnoses recorded in the NPR when aged 16–64 were identified in 2001–2013. In all, 81.7% (n = 16,158) were first identified from specialised out-patient visits and 89.7% (n = 17,738) had MS as the main diagnosis. They together had 308,761 visits with MS as a listed diagnosis during 2001–2013.

The nationwide prevalence of MS according to the NPR in the population aged 16–64 years steadily increased over the study period (Fig. 3). In 2013, the estimated prevalence was 258.3 (women 369.1; men 150.8) per 100,000. Women had a higher prevalence than men throughout the study.

Fig. 3
figure 3

Prevalence of MS per 100,000 individuals aged 16–64 in 2001–2013, identified in the Swedish National Patient Register. MS multiple sclerosis

Of the identified individuals, 92.5% (n = 18,291) had their diagnosis confirmed by the validation algorithms (Table 1). The primary algorithm confirmed the majority of diagnoses (17,922; 90.6%), with fewer corroborated by the secondary algorithm (369; 1.9%). We classified 1490 (7.5%) individuals as uncertain MS after cross-checking both case-definition algorithms. After the first two steps of the primary algorithm, MS diagnosis in the SMSReg and ≥ 3 visits due to MS recorded in the NPR, 89.4% (n = 17,692) of the provisional MS diagnoses were confirmed.

Table 1 Steps in the process to validate the MS diagnoses recorded in the Swedish National Patient Register (NPR) (2001–2013) (n = 19,781)a by two sequential register-based case-definition algorithms

The profiles of the confirmed and uncertain MS are presented in Table 2. The mean number of years between the first and second visit (2001–2013) in the NPR for the uncertain MS were 0.85 (95% CI 0.70–1.00) after the primary case-definition algorithm and 1.01 (95% CI 0.78–1.24) after the secondary case-definition algorithm. Of the individuals with confirmed MS (n = 18,291), 16,527 (90.4%) had MS as the main diagnosis and 14,921 (81.6%) first presented to out-patient healthcare. Of the uncertain MS, most of them had only one visit due to an MS diagnosis recorded in the NPR (78.5%, n = 1169) (not presented in the table).

Table 2 Characteristics of individuals identified during 1 January 2001–31 December 2013 with an MS diagnosis registered in the Swedish National Patient Register (NPR)a when aged 16–64, according to the register-based case-definition algorithms

A peak in first time NPR-registered MS diagnoses was observed when out-patient healthcare was included in 2001 (Online Resource 1).

Discussion

In this study we identified all individuals aged 16–64 with MS as a main or side diagnosis in the Swedish NPR during 2001–2013. We compared these provisional MS diagnoses against individual-level MS-specific information in other nationwide registers. With this novel method, we confirmed that 92.5% of the NPR-registered MS diagnoses could be validated after the two sequential register-based case-definition algorithms. Our findings support the NPR as a valuable source to identify MS cases in epidemiological studies.

The majority of provisional and confirmed MS diagnoses first presented in out-patient settings, with MS as the main diagnosis. A slightly higher percentage of the uncertain MS diagnoses were a side diagnosis. The spike in the number of registered MS diagnoses in 2001 corresponds with the inclusion of the specialised out-patient visits, and tapers off by 2004 to reasonably consistent annual numbers of first time identified diagnoses for the remaining study years. This trend suggests caution is required when identifying newly diagnosed MS cases from the NPR prior to 2004, i.e., the first years of including out-patient visits in the NPR.

The first two steps of the primary case-definition algorithm (MS diagnosis in the SMSReg and ≥ 3 visits with MS ever in the NPR) confirmed the majority of provisional MS diagnoses. The exploratory secondary case-definition algorithm identified a small number of plausible MS diagnoses, predominantly in the later years, with insufficient follow-up time to reasonably identify MS information in the registers. The individuals with uncertain MS after both case-definition algorithms could plausibly represent patients who were initially suspected of having MS, but the diagnosis was never confirmed, or simply due to administrative error. In total, 78.5% of the uncertain MS only had one visit ever due to MS.

A previous cross-referencing of the SMSReg to NPR by the National Board of Health and Welfare found that only 4% of MS entries in the NPR in 2008 were coded as ‘possible MS’ in the SMSReg [26], suggesting high validity of MS diagnoses recorded in the NPR [26]. Accordingly, when selecting a study population, the voluntary SMSReg may underestimate the MS population, especially for particular geographical regions [26], and we, based on this study, add that the mandated NPR may slightly overestimate the MS population (< 10%). Further studies are needed to elucidate occurrence of this.

Our prevalence estimates of MS for 16–64 year olds were higher than an all-age estimate of 189 per 100,000 at 31 December 2008 with MS identified in either the SMSReg or NPR [9]. The improved diagnostic criteria, better awareness, more efficacious disease modifying therapies (DMTs), and ultimately extended survival could be a potential explanation [6, 7]. The high prevalence of MS among working-aged individuals and increasing treatment options necessitates representative real-world results [6], which the NPR enables [1, 2].

Strengths of this study include the numerous nationwide registers with MS-specific information available to corroborate the MS diagnoses in the NPR. Usually, a ‘gold-standard’ of disease is derived from clinical data, albeit for a sample of the total population. Instead, we used a novel method of register-based algorithms linking individual-level register data, including MS-specific clinical information from the SMSReg [25], to determine the validity of all identified MS cases in Sweden in 2001–2013. This study included years both before and after identification in the NPR, a design consistent with the chronic and progressive nature of MS [31] and treatment guidelines for annual healthcare visits for individuals with MS or clinically isolated syndrome (CIS) [32].

One limitation was that the case-definition algorithms were limited to the different coverage dates of the contributing registers. For example, the SPDR, which was available for July 2005–December 2013 and only covered dispensed prescribed drugs, therefore, excluding both indented drugs administrated at clinics and DMTs that became available after 2013. Further, both MS and CIS patients may be prescribed interferon beta therapies [32], therefore, these ATC codes could not be used to confirm MS diagnoses.

In conclusion, the certainty of MS diagnoses for patients aged 16-64 in the NPR is very high according to this validation using register-based algorithms. These findings strengthen the notion that the Swedish NPR is a valuable source for MS populations in nationwide epidemiological studies.