Validation of multiple sclerosis diagnoses in the Swedish National Patient Register

Population-based registers are widely used in epidemiological studies. We aimed to estimate the validity of multiple sclerosis (MS) diagnoses registered in the Swedish National Patient Register (NPR) by two sequential register-based case-definition algorithms. Prevalent MS patients aged 16–64 years were identified from the in- and specialised out-patient NPR in 2001–2013, using International Classification of Diseases code G35. These identified MS diagnoses were validated through two sequential register-based case-definition algorithms, as the ‘gold-standard’ reference, by linking individual-level data longitudinally to other nationwide registers. The primary algorithm first sought to corroborate the MS diagnoses with MS-specific information in other nationwide registers. The exploratory secondary algorithm identified individuals with MS-related information in other registers and those who were unable to be followed sufficiently. Through multi-register linkage, we estimated the number of confirmed and uncertain individuals with an MS diagnosis recorded in the NPR. A total of 19,781 individuals (mean age at first visit 45.2 years; 69.5% women) had at least one MS diagnosis recorded in the NPR during 2001–2013. Using the two case-definition algorithms, 92.5% (n = 18,291) of the MS diagnoses recorded in the NPR were confirmed, while 7.5% (n = 1490) remained uncertain. Our findings indicate that a very high percentage of patients coded with an MS diagnosis in the Swedish NPR actually have MS, and supports the use of the NPR as a viable source to identify individuals with an MS diagnosis for population-based research. This exploratory methods paper suggests an alternative novel method to verify individuals’ diagnoses in register-based settings. Electronic supplementary material The online version of this article (10.1007/s10654-019-00558-7) contains supplementary material, which is available to authorized users.


Background
The Swedish nationwide population-based registers are widely used in epidemiological studies, providing a less resource demanding data source than patient records to establish representative scientific results [1][2][3][4]. Nonetheless, administrative registers may be subject to diagnostic and coding errors [5]. Individual-level linkage across the different registers can be used to corroborate the information contained within a single register. The National Patient Register (NPR), a principal data source for research [4,5], contains records of healthcare visits in Sweden with diagnoses registered by the International Classification of Diseases (ICD) codes. The validity of the complete NPR is high [5], but knowledge of data quality is lacking for some specific diseases, such as multiple sclerosis (MS).
MS is a neurodegenerative disease that typically affects younger adults, causing increasing levels of cognitive and physical impairment as the disease progresses [6][7][8]. Sweden has an especially high prevalence [7]; with an estimated all-age nationwide MS prevalence of 189 per 100,000 [9]. Despite that MS diagnoses are now being set earlier and more accurately, no single clinical feature or diagnostic test can positively diagnose MS [10][11][12]. This clinical context may reduce our certainty of the recorded MS diagnoses due Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s1065 4-019-00558 -7) contains supplementary material, which is available to authorized users. 1 3 to the risk of misdiagnosis between MS and other degenerative neurological diseases, a risk which has been suggested to still exist with the 2017 update of the McDonald criteria [10,13]. Therefore, it has been questioned to what extent an MS diagnosis is incorrectly assigned and recorded in administrative records during the period investigating whether a patient has MS [10,[13][14][15][16][17][18].
Besides the NPR, several other Swedish nationwide registers contain MS-specific information (i.e., MS ICD codes or medications), which could be used to identify MS cases. Accordingly, we aimed to assess the validity of MS diagnoses recorded in the NPR with a novel method using information linked from other nationwide registers in two sequential register-based case-definition algorithms.

Methods
This validation study evaluates the NPR as a relevant source for MS studies by comparing the records of all individuals aged 16-64 with an MS code during 1 January 2001-31 December 2013 in the NPR against other register-based MS records, through two sequential register-based case-definition algorithms as the 'gold-standard' reference.

Data sources
The unique personal identity number assigned to all residents in Sweden enabled individual-level linkages across six nationwide registers [19], administered by the following four authorities: • National Board of Health and Welfare: The National Patient Register (NPR) includes information about healthcare visits, including main and all side diagnoses coded according to the Swedish version of the ICD at time of visit [20]. The NPR has had nationwide coverage since 1987 of in-patient admissions and of out-patient specialist visits since 2001 [5]. The Swedish Prescribed Drug Register (SPDR) contains nationwide records of dispensed medication prescriptions with Anatomical Therapeutic Classification (ATC) codes since 1 July 2005 [21].

Study population
All individuals with a main or side diagnosis of MS (ICD-10: G35) recorded when having a healthcare visit in 2001-2013, aged 16-64 years, were identified in the NPR (n = 19,781). From the date of first MS diagnosis in 2001-2013 in the NPR, both retrospective and prospective information was obtained from the above mentioned six registers, in order to confirm the MS diagnosis. If it was not possible to confirm the MS diagnosis, the individual was followed until emigration, death, or end of data extraction of each register (varies between 2013 and 2016), whichever came first (Fig. 1).

Statistical analyses
We  1). These algorithms, applied in a sequential manner, were thought to be equivalent to a diagnostic test to confirm the registered diagnoses [30]. Accordingly, the MS diagnoses recorded in the NPR were treated as 'provisional' MS until validated as 'confirmed' MS as per the algorithms. Figure 2 displays the primary and secondary sets of case-definition algorithms, including the order of the steps identifying confirmed MS and the 'gold-standard' data source used [1]. The primary algorithm identified MS-specific codes (ICD-10 G35, ICD-9 340, or specific ATC codes for MS medications) in other registers in six steps. In the first step of this algorithm, individuals with clinical MS information, entered by their treating neurologist, in the SMSReg were identified. The secondary casedefinition was then applied to the remaining provisional MS diagnoses after completing the primary algorithm. This exploratory algorithm matched to other MS-related information (MS-like symptom ICD codes), and sought to identify individuals who were unable to be sufficiently followed in the data to confirm the diagnosis as 'plausible' MS. The individuals who remained at the end of the last step in each set of case-definition algorithms were classified as 'uncertain' MS (i.e., patients who received at least one MS diagnosis recorded in the NPR, but without other register-based information corroborating the diagnosis). The confirmed MS are those identified according to both case-definition algorithms, i.e., had other register-based MS information supporting their MS diagnosis in the NPR. We then plotted the proportions of confirmed, plausible, and uncertain MS by the year of first MS record in the NPR.
Finally, in order to estimate the influence of the uncertain MS cases in the NPR if included in study populations, we profiled the descriptive characteristics of those remaining as uncertain MS after each set of algorithms and those confirmed MS (including plausible MS diagnoses). Similarity of the characteristics of uncertain MS after the secondary algorithm and those of confirmed MS were tested using Chi square tests. The mean number of years between the first and second MS record in the NPR with 95% confidence intervals (CI) were calculated. All analyses were conducted in SAS v.9.4.   (Fig. 3). In 2013, the estimated prevalence was 258.3 (women 369.1; men 150.8) per 100,000. Women had a higher prevalence than men throughout the study.
Of the identified individuals, 92.5% (n = 18,291) had their diagnosis confirmed by the validation algorithms ( Table 1). The primary algorithm confirmed the majority of diagnoses (17,922; 90.6%), with fewer corroborated by the secondary algorithm (369; 1.9%). We classified 1490 (7.5%) individuals as uncertain MS after cross-checking both casedefinition algorithms. After the first two steps of the primary algorithm, MS diagnosis in the SMSReg and ≥ 3 visits due to MS recorded in the NPR, 89.4% (n = 17,692) of the provisional MS diagnoses were confirmed.
The profiles of the confirmed and uncertain MS are presented in Table 2. The mean number of years between the first and second visit (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) in the NPR for the uncertain MS were 0.85 (95% CI 0.70-1.00) after the primary case-definition algorithm and 1.01 (95% CI 0.78-1.24) after the secondary case-definition algorithm. Of the individuals with confirmed MS (n = 18,291), 16,527 (90.4%) had MS as the main diagnosis and 14,921 (81.6%) first presented to out-patient healthcare. Of the uncertain MS, most of them had only one visit due to an MS diagnosis recorded in the NPR (78.5%, n = 1169) (not presented in the table).
A peak in first time NPR-registered MS diagnoses was observed when out-patient healthcare was included in 2001 (Online Resource 1).

Discussion
In this study we identified all individuals aged 16-64 with MS as a main or side diagnosis in the Swedish NPR during 2001-2013. We compared these provisional MS diagnoses against individual-level MS-specific information in other nationwide registers. With this novel method, we confirmed that 92.5% of the NPR-registered MS diagnoses The first two steps of the primary case-definition algorithm (MS diagnosis in the SMSReg and ≥ 3 visits with MS ever in the NPR) confirmed the majority of provisional MS diagnoses. The exploratory secondary case-definition algorithm identified a small number of plausible MS diagnoses, predominantly in the later years, with insufficient follow-up time to reasonably identify MS information in the registers. The individuals with uncertain MS after both case-definition algorithms could plausibly represent patients who were initially suspected of having MS, but the diagnosis was never confirmed, or simply due to administrative error. In total, 78.5% of the uncertain MS only had one visit ever due to MS.
A previous cross-referencing of the SMSReg to NPR by the National Board of Health and Welfare found that  only 4% of MS entries in the NPR in 2008 were coded as 'possible MS' in the SMSReg [26], suggesting high validity of MS diagnoses recorded in the NPR [26]. Accordingly, when selecting a study population, the voluntary SMSReg may underestimate the MS population, especially for particular geographical regions [26], and we, based on this study, add that the mandated NPR may slightly overestimate the MS population (< 10%). Further studies are needed to elucidate occurrence of this.
Our prevalence estimates of MS for 16-64 year olds were higher than an all-age estimate of 189 per 100,000 at 31 December 2008 with MS identified in either the SMSReg or NPR [9]. The improved diagnostic criteria, better awareness, more efficacious disease modifying therapies (DMTs), and ultimately extended survival could be a potential explanation [6,7]. The high prevalence of MS among working-aged individuals and increasing treatment options necessitates representative real-world results [6], which the NPR enables [1,2]. Strengths of this study include the numerous nationwide registers with MS-specific information available to corroborate the MS diagnoses in the NPR. Usually, a 'goldstandard' of disease is derived from clinical data, albeit for a sample of the total population. Instead, we used a novel method of register-based algorithms linking individuallevel register data, including MS-specific clinical information from the SMSReg [25], to determine the validity of all identified MS cases in Sweden in 2001-2013. This study included years both before and after identification in the NPR, a design consistent with the chronic and progressive nature of MS [31] and treatment guidelines for annual healthcare visits for individuals with MS or clinically isolated syndrome (CIS) [32].
One limitation was that the case-definition algorithms were limited to the different coverage dates of the contributing registers. For example, the SPDR, which was available for July 2005-December 2013 and only covered dispensed prescribed drugs, therefore, excluding both indented drugs administrated at clinics and DMTs that became available after 2013. Further, both MS and CIS patients may be prescribed interferon beta therapies [32], therefore, these ATC codes could not be used to confirm MS diagnoses.
In conclusion, the certainty of MS diagnoses for patients aged 16-64 in the NPR is very high according to this validation using register-based algorithms. These findings strengthen the notion that the Swedish NPR is a valuable source for MS populations in nationwide epidemiological studies.