Background

Management of chronic disease is the major challenge facing health systems worldwide [1]. Many people with chronic disease have multiple chronic conditions, which is termed multimorbidity [2]. It is clear that multimorbidity is common and associated with worse clinical outcomes and higher health care costs, compared to good health or to the presence of a single chronic condition [3-6]. However, there are key knowledge gaps concerning the basic epidemiology of multimorbidity [7]; its clinical and economic consequences; and how it contributes to disparities in health [8]. This information is prerequisite to mitigating the impact of multimorbidity and chronic disease [9].

Multimorbidity has been identified as a key research priority by the Public Health Agency of Canada, and is crucial to inform programming and resource forecasting. Knowledge of secular changes in the incidence and prevalence of multimorbidity is required, and would be facilitated by methods for identifying the presence of multimorbidity using administrative data.

Identifying morbidity using administrative data can be simple (e.g., based on a single hospitalization and using only a small number of codes) or complex (e.g., including inpatient and outpatient encounters and long lists of codes). Once developed, algorithms may be validated against a suitable gold standard (e.g., chart reviews; other previously validated algorithms).

Previous work by Barnett et al [3] identified 40 morbidities and were informed by a systematic review of multimorbidity measures [10], the Quality and Outcomes Framework of the UK General Practice contract, and health service planning by NHS Scotland. However, these authors used administrative data sources that are unique to the United Kingdom to identify the presence or absence of these conditions. A corresponding scheme based on the more widely used ICD-9 CM/ICD-10 system is not available.

Therefore, we first identified previously validated algorithms that use the ICD-9 CM/ICD-10 system for ascertaining the presence of chronic conditions using inpatient and outpatient claims and utilization data. We then showed proof of concept for using administrative data to study multimorbidity, by applying these previously validated algorithms to a population of adults residing in Edmonton, Canada between April 2008 and March 2009.

Methods

The institutional review boards at the University of Alberta (Pro00038795) and the University of Calgary (E22590) approved the study.

Morbidities

We did a focused literature search for validated algorithms that use ICD-9 CM/ICD-10 codes in administrative data from inpatient and outpatient encounters to ascertain the presence or absence of the 40 morbidities identified by Barnett et al [3]. We searched MEDLINE using combinations of the following MeSH subject headings together with the specific morbidity of interest: ‘International Classification of Diseases’, ‘Reproducibility of Results’, and ‘Sensitivity and Specificity’. Based on an a priori decision, we considered algorithms to be of high validity if they had both positive predictive value (PPV) and sensitivity ≥70% as compared to an acceptable gold standard such as chart review. We considered algorithms to be of moderate validity if they had PPV ≥70% but sensitivity <70%. The cut-off values for PPV and sensitivity were based on previous validation studies of administrative data [11]. We did not consider negative predictive value or specificity, because these parameters are generally >90% in studies of chronic diseases among the general population [11,12]. The definition of multimorbidity required the coexistence of two or more of the morbidities. In a secondary analysis we used a more restrictive definition that required three or more morbidities to be present.

ICD codes

Canadian hospital discharge abstract data are coded with ICD-10 CA, which essentially increases specificity compared to the ICD-10 system by adding more digits [11]. All of the ICD-10 codes from the included algorithms are consistent with ICD-10 CA codes, and thus we used ICD-10 and ICD-10 CA codes interchangeably throughout this manuscript. When ICD-10 codes were not given in the primary papers, we used the Canadian Institute for Health Information (CIHI; www.cihi.ca) conversion table to convert ICD-9 CM codes to ICD-10 codes. Many algorithms required multiple codes within a specified time period to determine incidence of morbidity (Table 1). In each case the index date for the disease was considered to be the date of the first code. For example, in order to determine presence of asthma, we searched for ICD-9 CM 493 and ICD-10 J45 codes in hospitalizations and outpatient encounters. We considered asthma to have developed at the first instance of a single hospitalization with either of these codes, or a single outpatient encounter followed by two further outpatient encounters with either of these codes within two years. In either case, we considered the participant to have asthma for the duration of follow-up.

Table 1 Administrative algorithms for 30 morbidities

Proof of concept

We applied identified algorithms with high or moderate validity to a population-based administrative dataset from Alberta Health (AH; the provincial health ministry) and Alberta clinical laboratories. Details of this administrative dataset including claims, hospitalizations and Ambulatory Care Classification System (ACCS) utilization are given in Figure 1 and have been reported elsewhere [13]. We assembled a cohort of adults aged ≥18 years who resided in the city of Edmonton, Alberta between April 2008 and March 2009, and included all people registered with AH. All Alberta residents are eligible for insurance coverage by AH, and >99% participate in this coverage. The dataset included demographic information such as postal code of residence, laboratory data, and medication in those aged ≥65 years [13]. We identified Edmonton residents from the AH registry file using the community name variable from the Statistics Canada Postal Code 2008 Conversion file [14] (www.statcan.gc.ca).

Figure 1
figure 1

Development of the cohort. ICD International Classification for Diseases, CKD chronic kidney disease. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. Lancet. 2012;380(9836):37-43.

To demonstrate proof of concept for applying these algorithms to a large administrative dataset, we presented a simple summary of the prevalence of morbidity and multimorbidity in the study population. Counts and percentages were presented along with a figure showing how the number of morbidities varies by age. In sensitivity analyses, we presented the prevalence of morbidity as assessed by different sources of administrative data.

Results

Algorithms

We identified 16 morbidities for which the best identified algorithm was of high validity: asthma, atrial fibrillation, metastatic cancer, chronic heart failure, chronic kidney disease, chronic pain, cirrhosis, diabetes, hypertension, irritable bowel syndrome, multiple sclerosis, myocardial infarction, peripheral vascular disease, psoriasis, schizophrenia and severe constipation. We identified an additional 14 morbidities (including two algorithms for other types of cancer and one algorithm for another type of liver disease) for which the best identified algorithm was of moderate validity: alcohol misuse, lymphoma, non-metastatic cancer (breast, cervical, colorectal, lung, and prostate), chronic pulmonary disease, chronic viral hepatitis B, dementia, depression, epilepsy, hypothyroidism, inflammatory bowel disease, Parkinson’s disease, peptic ulcer disease, rheumatoid arthritis, and stroke or transient ischemic attack. We excluded the remaining morbidities for which no suitable algorithm could be identified (Additional file 1 Table S1). Thus we identified 30 conditions using administrative algorithms (including ICD-9 CM and ICD-10 codes) that are summarized in the Table 1. Of these 30 algorithms, half were validated for both ICD-9 CM and ICD-10 codes.

We identified all conditions exclusively using ICD-9 CM and ICD-10 data with the exception of chronic kidney disease, for which we used a validated algorithm applied to ICD-9 CM and ICD-10 data [15] and supplemented using serum creatinine and albuminuria data [16]. We considered chronic kidney disease to be present if a participant met either the administrative or laboratory criteria.

In some cases, we made minor changes to the published algorithms to improve anticipated diagnostic performance, to increase consistency between algorithms used for the different conditions, and to include application to the outpatient setting. First, the original publications by Quan et al [11,17] required one hospitalization to identify the presence of a chronic condition; based on input from the first author of that paper, we modified this algorithm to allow either one inpatient code or two outpatient codes within two years to define the presence of these conditions. Second, to improve mapping of ICD-9 codes from the original (published) algorithm into ICD-10, we combined the ‘highly likely’ and the ‘likely’ codes from the original algorithm for chronic pain [18]. Third, for consistency, we modified algorithms that defined conditions as present if participants had two codes within any duration of follow-up (no matter how long) to require that the two codes occur within a three year period [19,20]. Fourth, we expanded the criteria for presence of atrial fibrillation, epilepsy, irritable bowel syndrome, and severe constipation to include two outpatient codes within two years for these conditions. However, to ensure that secondary (post-surgical) bowel complications were not incorrectly classified as chronic bowel conditions, we excluded any hospitalization for surgery when assessing the presence or absence of these conditions [21]. Fifth, we expanded the criteria for presence of stroke or TIA to include one outpatient code. Sixth, we reviewed all algorithms for overlapping codes (situations where the same code was used to identify more than one condition), and modified the algorithms to avoid double-counting of morbidities (see footnotes in the Table 1 for specific details).

Application of the algorithms to the Edmonton cohort

The study cohort included 574,409 participants (Figure 1, Table 2). Almost two-thirds were less than 50 years of age. Ten percent were 70 years of age or older and the proportion of men and women was similar. Approximately half of all participants were not identified as having any of the 30 morbidities for which high or moderate validity algorithms existed. Approximately one quarter were identified as having one of these 30 morbidities. Another quarter were identified as having 2 or more of these 30 morbidities (meeting the primary criterion for multimorbidity), whereas 12% had three or more (meeting the secondary criterion for multimorbidity).

Table 2 Morbidity characteristics of participants residing in Edmonton, Alberta during the April 2008 to March 2009 fiscal year

The apparent prevalence of most morbidities was greatly reduced (often by 50% or more) when we assessed their presence or absence using hospitalization data only (Table 2). The addition of ACCS data to hospitalization and claims data made little difference to prevalence estimates, with the possible exceptions of chronic pain, hepatitis B and cirrhosis (the prevalences of which all changed by >20%). In most cases, the algorithms as originally validated resulted in a prevalence that was intermediate between the most inclusive approach (using hospitalization, claims and ACCS data) and the most restrictive approach (using hospitalization data only). As expected, adding gold standard laboratory data for kidney function (eGFR and albuminuria) resulted in substantial increases in the apparent prevalence of chronic kidney disease as compared to administrative data alone, regardless of which administrative data sources were used.

Figure 2 depicts the percentage of participants with multimorbidity by age group. After hypertension (a prevalence of 23%), 21% had chronic kidney disease, 9% had diabetes, and 9% had depression.

Figure 2
figure 2

Number of morbidities by age in Edmonton, Alberta during the April 2008 to March 2009 fiscal year.

Discussion

From a published list of chronic conditions [3], we identified a total of 30 validated algorithms including 3 algorithms for different types of cancer and 2 algorithms for liver disease. We applied the algorithms to ICD codes from claims and utilization data, and identified the presence or absence of these conditions in a cohort of 574,409 adults residing in Edmonton, Alberta between April 2008 and March 2009 (Figure 1). The overall prevalence of multimorbidity in this cohort was 26%, which is similar to the prevalence as reported in the Barnett study [3]. Our findings demonstrate proof of concept for using administrative data as a surveillance tool for multimorbidity in settings with systems for reliably capturing population-based claims and utilization data.

Multiple prior studies have ascertained the presence of various chronic conditions in the context of assessing multimorbidity [4,7,22-34]. Although there is no universally accepted definition of multimorbidity (or a list of conditions that should be used to assess the presence of multimorbidity) there appears to be consensus on several issues. First, health conditions used to define multimorbidity should be chronic but not necessarily permanent. Second, two or more concomitant conditions should be required to identify a person as having multimorbidity. Third, an attempt should be made to standardize definitions across studies to facilitate comparisons between populations [7,9,34,35]. At the same time, it is important that algorithms selected for use with administrative data should be validated against a gold standard – and demonstrate acceptable diagnostic properties so as to ensure reasonably accurate classification of individuals with respect to morbidity status. We focused on validated algorithms with positive predictive value and sensitivity ≥70%, compared to an acceptable gold standard such as chart review. Because we had access to laboratory data allowing a gold standard assessment of kidney function, we primarily assessed the presence of chronic kidney disease using estimated glomerular filtration rate (eGFR) and albuminuria rather than administrative data.

To our knowledge, this is the most comprehensive panel of validated algorithms yet applied to administrative data for the study of multimorbidity. Other studies have used reasonable but unvalidated algorithms, a more limited list of candidate chronic conditions or both. Although there are undoubtedly other chronic conditions that could be identified using administrative data, we focused on those for which available algorithms appear to have adequate sensitivity as well as positive predictive value. We will use the set of algorithms described herein as the foundation for a series of studies describing the epidemiology of multimorbidity in Alberta, Canada.

Besides the various definitions of multimorbidity that they have used, existing studies in this area have several other limitations [36]. First, population-based studies are rare (especially in Canadian settings); most studies have captured patients followed by a particular centre and are vulnerable to referral bias. Second, most studies have been unable to assess the link between multimorbidity and clinical outcomes in subgroups defined by age, sex, or low socioeconomic status. Third, little is known about the relative frequency of individual chronic conditions within the multimorbidity syndrome – or about which clusters of conditions are most common and/or clinically significant. Fourth, studies examining the economic consequences of multimorbidity have typically used relatively unsophisticated methods and/or studied only select populations. The scheme outlined in the current manuscript will allow our group to do future studies that close these knowledge gaps – informing policy and practice. We are optimistic that the scheme will also be used by other researchers from other jurisdictions with similar datasets – facilitating comparisons between studies. Future studies should test the relative importance of the morbidities identified in the current manuscript, as well as considering other potentially important morbidities for inclusion.

Limitations of the current approach include those common to all studies using administrative data. For example, we do not have information on potential confounders related to lifestyle (e.g., diet, smoking, exercise) or on measured blood pressure, which may be confounders when examining the association between multimorbidity and outcomes or costs. However, this limitation would not be expected to affect feasibility of applying the algorithms or the prevalence estimates reported here. Second, identification of some of the chronic conditions we studied might have been enhanced by simultaneous consideration of medication data [3]. We decided against including publicly funded medication data to define these conditions because medication coverage in Alberta is limited to people aged ≥65 years; those of lower SES; or with high annual medication costs. Thus, using medication data to define conditions would have biased towards a higher incidence of multimorbidity in older, poorer and sicker participants. We decided against restricting the cohort to people aged ≥65 years, because multimorbidity is relatively common in younger participants – and there might be important differences in the nature and implications of multimorbidity by age. Therefore, we will include all adult Albertans in our forthcoming analyses. Third, since participants must use medical services to be diagnosed with chronic conditions, our findings underestimate the true population burden of multimorbidity – especially for conditions that are less likely to lead to hospitalization but which may still significantly impact quality of life and other important outcomes. Fourth, we did not identify appropriate algorithms for all of the 40 target conditions, possibly because our searches were not exhaustive, and other important conditions such as obesity were not considered in this study. Therefore, our results likely underestimate the true prevalence of multimorbidity. Finally, although we focused on validated algorithms, the diagnostic performance of algorithms may vary between settings, based on coding practices and the reliability of data capture – and we did not systematically evaluate the quality of the original studies. Therefore (despite the lack of an a priori reason to suspect worse performance in our dataset), it is possible that some algorithms that were of high or moderate validity in other jurisdictions may perform less well when applied to Alberta data, especially with the modifications as described herein.

Conclusions

In summary, we identified a panel of 30 chronic conditions that can be identified from administrative data using validated algorithms, facilitating the study and surveillance of multimorbidity. We encourage other groups to use this scheme, to facilitate comparisons of data on multimorbidity between settings and jurisdictions.