Background

Bariatric surgery is the most effective treatment for severe obesity, a risk factor for many health conditions including cardiovascular diseases and death [1]. Patients who undergo bariatric surgery can achieve effective weight loss and remission of many comorbidities [2, 3]. However, between 2011 and 2018, only 1% of adults with severe obesity in the United States received bariatric surgery in a given year [4, 5]. With the persistent increase in the prevalence of obesity and considerable shift in the type of bariatric surgical operations performed over the last decade [5], it is important to evaluate the long-term comparative effectiveness and safety of different operations.

Administrative claims databases are an important real-world data source in comparative effectiveness and safety research. These databases often provide large and demographically diverse study populations at a fraction of the cost compared to other data sources [6]. Claims databases also capture most, if not all, medically attended events including hospitalizations and procedures performed. However, claims databases are generally considered inadequate for obesity-related research due to the lack of body mass index (BMI) measurements and the underuse and poor validity of weight-related diagnosis codes [7,8,9,10]. This limitation may not necessarily apply to bariatric surgery research because most health insurers in the United States require surgical facilities to receive approval to perform a given bariatric operation (a.k.a., “prior authorization”). This process involves documentation of eligibility, including having a BMI measurement ≥40 kg/m2, or a BMI measurement ≥35 kg/m2 with at least 1 obesity-related co-morbidity, which are typically converted into diagnosis codes in the patient’s medical record and reimbursement claims [11,12,13]. In addition, the specific International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) weight-related diagnosis codes denoting BMI ranges became available in 2006, with a subset of diagnosis codes indicating BMI ≥40 kg/m2 becoming effective in January 2011. The more granular ICD-10-CM codes became effective in October 2015. These coding changes and the prior authorization requirements may considerably improve the availability and validity of weight-related diagnosis codes in claims databases among bariatric surgery patients.

In this study, we evaluated the availability and validity of weight-related diagnosis codes before and after bariatric surgical operations in a large claims database linked to an electronic health record (EHR) database with actual BMI measurements.

Methods

Data source

This study used data from the OptumLabs® Data Warehouse (OLDW), which contains linked de-identified administrative claims data for commercially insured and Medicare Advantage enrollees, and de-identified EHR data that has been normalized and standardized into a single database. As of May 2019, the database contains longitudinal health information on over 200 million lives, 137 million in claims, 88 million in the EHR, and 26 million in the linked component since 2007 from a diverse mixture of ages, ethnicities, and geographical regions across the United States [14]. The claims data component includes physician, pharmacy, and facility claims submitted for reimbursement for covered members. Both paid and denied claims are included in the database and analysis, except for pharmacy claims where only paid claims are included in the analysis. The EHR component includes clinical diagnoses, procedures, prescriptions, clinical notes, laboratory results, and vital signs (including BMI) recorded as part of routine clinical practice.

Study populations

We created 3 nested study cohorts using different components of OLDW to evaluate the availability (Cohort 1) and validity (Cohorts 2 and 3) of claims-based weight-related diagnosis codes before and after the bariatric surgical operation (Additional file 1 eFigure 1). The study was approved by the Harvard Pilgrim Health Care institutional review board with an exemption and waiver of individual patient consent.

Cohort 1

Using the claims data, we identified a retrospective cohort of patients aged 18 years or older who underwent adjusted gastric banding (AGB), Roux-en-Y gastric bypass (RYGB), or sleeve gastrectomy (SG) between January 1, 2011 and June 30, 2018. Eligible patients had continuous health plan enrollment with medical and pharmacy benefits during the 6-month period preceding the index bariatric operation, which could occur in an inpatient or ambulatory care setting. To minimize the inclusion of patients with non-obesity indications, we excluded patients who had any major bariatric operation, revisional procedures, or gastrointestinal malignancy in the 6-month preoperative period, as well as patients who had an emergency department encounter or a diagnosis of gastrointestinal ulcers on the day of the index operation. We further excluded patients who had multiple conflicting bariatric operation procedure codes on the day of index operation. The cohort was identified using ICD-9-CM (prior to October 1, 2015) and ICD-10-CM (on or after October 1, 2015) diagnosis and procedure codes; Current Procedural Terminology, Fourth Edition (CPT-4®); and the Healthcare Common Procedure Coding System. We used this cohort to evaluate the availability of claims-based weight-related diagnosis codes before and after the bariatric operation.

Cohorts 2 and 3

Cohort 2 consisted of the subset of patients in Cohort 1 who had ≥1 preoperative claims-based weight-related diagnosis code with the last available code being granular (e.g., V85.30 or Z68.30 indicating BMI between 30.0–30.9 kg/m2) and ≥ 1 EHR-based BMI measurement recorded ±30 days of the granular code during the 6-month preoperative period (including the index operation day). We used this cohort to evaluate the performance of our claims-based severe obesity and BMI categorization algorithms (defined below) in the preoperative period. Cohort 3 consisted of the subset of patients in Cohort 2 whose last available claims-based postoperative weight-related diagnosis was a granular code with ≥1 EHR-based BMI measurement recorded ±30 days of this diagnosis code during the 1-year postoperative period. We used Cohort 3 to evaluate the performance of our claims-based algorithms in the postoperative period.

Development of claims-based algorithms for severe obesity and BMI categorization

We created 2 claims-based algorithms using weight-related diagnosis codes (Additional file 1 eTable 1): a severe obesity classification algorithm and a BMI categorization algorithm. The severe obesity classification algorithm classified patients as having “severe obesity” if they had ≥1 claims-based weight-related diagnosis code indicating BMI ≥35 kg/m2 any time during the 6-month preoperative period. In bariatric surgery research, this algorithm can be used as an important cohort selection criterion to identify patients with severe obesity as the treatment indication.

The BMI categorization algorithm classified a patient’s BMI into 1 of the 10 levels as indicated by their last available weight-related diagnosis codes separately during the 6-month preoperative and 1-year postoperative periods (BMI levels, kg/m2: ≤19.9, 20.0–24.9, 25.0–29.9, 30.0–34.9, 35.0–39.9, 40.0–44.9, 45.0–49.9, 50.0–59.9, 60.0–69.9, and ≥ 70.0). This algorithm can be used to measure the last available preoperative BMI, which is an important covariate for comparative effectiveness research on bariatric surgery as preoperative BMI may be associated both with operation choice and risks of many health outcomes. The algorithm can also measure the last available BMI measurement within a defined postoperative follow-up period (e.g., 1 year in this study) for weight-related outcome assessment.

Validation of claims-based algorithms for severe obesity and BMI categorization

We used the EHR-based BMI measurements recorded during an encounter to validate the claims-based algorithms. We classified patients as having severe obesity if they had ≥1 EHR-based BMI measurements ≥35 kg/m2 any time during the 6-month preoperative period. For BMI categorization, we classified a patient’s most proximate EHR-based BMI measurement recorded ±30 days of the last available claims-based diagnosis code in the 6-month preoperative period (for preoperative analyses) and the last available EHR-based BMI measurement in the 1-year postoperative period (for postoperative analyses), separately, into 1 of the 10 levels described above.

Statistical analyses

Availability and predictors of weight-related diagnosis codes during the preoperative and postoperative periods

We described the presence of weight-related ICD-9-CM and ICD-10-CM diagnosis codes occurring any time in the 6-month preoperative period and the 1-year postoperative period, separately, in Cohort 1. We also performed the analysis by operation type, calendar year, and coding era (before October 1, 2015 for the ICD-9-CM era; October 1, 2015 and later for the ICD-10-CM era). We assessed factors associated with the presence of preoperative and postoperative claims-based weight-related diagnosis codes, separately, using logistic regression models. Factors selected a priori included demographic characteristics, region of residence, calendar year, coding era, type of index bariatric operation, care setting of index operation, and medical history measured in the 6-month preoperative period (including the Charlson-Elixhauser comorbidity index score [15], individual comorbid conditions, and prior hospital admissions). The Charlson-Elixhauser comorbidity index score was originally developed to predict mortality risk in older patients [15]; we used the score as a proxy for general health status.

Performance of the severe obesity classification algorithm during the preoperative period

We assessed the performance of the severe obesity classification algorithm using sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) within Cohort 2. The sensitivity was calculated as the proportion of patients accurately classified as having severe obesity based on claims-based diagnosis code (i.e., true positives) among those classified as such based on their EHR-based BMI measurement. The specificity was calculated as the proportion of patients accurately classified as not having severe obesity based on claims-based diagnosis codes (i.e., true negatives) among those whose EHR-based BMI measurement indicated as such. The PPV was calculated as the proportion of true positives among patients classified as having severe obesity based on their claims-based diagnosis code. The NPV was calculated as the proportion of true negatives among patients classified as not having severe obesity based on diagnosis code.

Performance of the BMI categorization algorithm during the preoperative and postoperative periods

We evaluated the performance of the BMI categorization algorithm separately in the 6-month preoperative period using Cohort 2 and in the 1-year postoperative period using Cohort 3. In both preoperative and postoperative periods, we assessed the concordance between the last available claims-based weight-related diagnosis code and its most proximate EHR-based BMI measurement recorded ±30 days of the claims-based diagnosis code by estimating the weighted Cohen’s kappa. As a variation of the Cohen’s kappa, a measure of the degree of agreement, the weighted kappa assigns weights for partial agreement according to their distance from the perfect agreement [16]. The weighted kappa ranges from − 1 to 1 with negative values possible but unlikely in practice. In general, kappa values >0.75 are considered excellent, 0.45–0.75 are considered fair to good, and <0.40 are considered poor agreement [17]. In both preoperative and postoperative periods, we also estimated the sensitivity, specificity, PPV, and NPV within each level of the algorithm.

Sensitivity analyses

We examined a different severe obesity classification algorithm using BMI ≥40 kg/m2 as the cutoff. We also varied the BMI categorization algorithm by (1) using larger BMI intervals (5-level BMI categories, kg/m2: ≤29.9, 30.0–39.9, 40.0–49.9, 50.0–59.9, ≥60.0; 4-level categories: underweight ≤19.9, normal 20.0–24.9, overweight 25.0–29.9, obese ≥30.0), and (2) adding nonspecific weight-related diagnosis codes (e.g., 278.00/E66.9 [unspecific obesity], 278.01/E66.01 [morbid obesity], 278.03/E66.2 [obesity hypoventilation syndrome], E66.09 [other obesity due to excess calories], E66.1 [drug-induced obesity], and E66.8 [other obesity] for obese) and assessed their performance during the preoperative and postoperative periods (Additional file 1 eTable 2). In addition, we examined the impact of the proximity restriction between the claims-based weight-related diagnosis code and the EHR-based BMI measurement on their concordance in the preoperative and postoperative periods. We also separately evaluated the performance of the BMI categorization algorithm for the last available BMI during the 6-month and 2-year postoperative periods. We performed all analyses with SAS Enterprise Guide 7.13 for Windows (SAS Institute, Cary, NC).

Results

Population characteristics

Cohort 1 included 29,357 patients, with 2941 (10.0%) having AGB, 9445 (32.2%) having RYGB, and 16,971 (57.8%) having SG. Table 1 shows their baseline characteristics. The population was largely female (75.5%) and white (67.0%) with a mean age of 47.0 years. The most prevalent comorbid conditions were hypertension (68.8%), gastroesophageal reflux disease (62.7%), and dyslipidemia (55.3%).

Table 1 Baseline characteristics of 29,357 patients who received a bariatric surgical operation, 2011–2018 (Cohort 1)

Cohort 2 included 3045 patients from Cohort 1 who had both claim-based weight-related diagnosis codes (with the last preoperative code being granular) and EHR-based BMI measurements in the 6-month preoperative period; 196 (6.4%) had AGB, 1251 (34.6%) had RYGB, and 1794 (58.9%) had SG. Compared to Cohort 1, the average age was slightly higher (47.6 years), and slightly more patients had hypertension (69.5%) and dyslipidemia (56.4%) in Cohort 2. On the index operation day, 77.6% had both claims-based diagnosis codes and EHR-based BMI measurements.

Cohort 3 included 511 patients from Cohort 2 who had granular last available claims-based weight-related diagnosis codes in the 1-year postoperative period with ≥1 EHR-based BMI measurement in the ±30 days of the diagnosis code, with 31 (6.1%) having AGB, 190 (37.2%) having RYGB, and 290 (56.8%) having SG. Compared to Cohorts 1 and 2, the average age was higher (48.9 years) in Cohort 3, more patients had hypertension (71.8%) and dyslipidemia (58.1%), and fewer had non-alcoholic fatty liver disease (22.5%) or diagnosis codes indicating smoking (1.8%). On average, patients had their first weight-related diagnosis code around 57 days after index operation and last available diagnosis code 159 days before the end of 1-year follow-up.

Presence of weight-related diagnosis codes

6-month preoperative period

Most of the patients in Cohort 1 had ≥1 claims-based weight-related diagnosis code, with 27,407 (93.4%) having granular codes, 1421 (4.8%) having nonspecific codes, and 529 (1.8%) having none. The prevalence of patients without a weight-related diagnosis code decreased from 3.4% in 2011 to 1.6% in 2018, while the presence of granular codes increased from 86.5% in 2011 to 97.1% in 2018 (Figure 1). The granular diagnosis codes were more prevalent in the ICD-10-CM era than the ICD-9-CM era (96.8% versus 91.1%). Similar increasing trends were observed across operation types, with higher prevalence of granular diagnosis codes observed in SG patients (Additional file 1 eFigures 2 & 3).

Fig. 1
figure 1

Presence of claims-based weight-related ICD-9-CM or ICD-10-CM diagnosis codes during the 6-month preoperative period for bariatric surgery patients in 2011–2018

1-year postoperative period

Among the 27,407 patients with granular weight-related diagnosis codes in the 6-month preoperative period in Cohort 1, 12,346 (45.0%) had granular codes, 9355 (34.1%) had nonspecific codes, and 5706 (20.8%) did not have any codes in the first postoperative year (Fig. 2). The distribution of diagnosis codes was similar among patients receiving different types of operation.

Fig. 2
figure 2

Presence of claims-based weight-related ICD-9-CM or ICD-10-CM diagnosis codes during the first postoperative year. Left panel: among all patients who underwent one of the three main bariatric surgical operations in 2011–2018; Middle panel: among patients who had weight-related diagnosis codes during the 6-month preoperative period; Right panel: among patients who had granular weight-related diagnosis codes during the 6-month preoperative period. Granular codes are diagnosis codes denoting narrow body mass index (BMI) ranges (e.g., V85.30 or Z68.30 indicating BMI between 30.0 and 30.9 kg/m2); Nonspecific codes are diagnosis codes denoting broad BMI ranges or obesity status

Factors associated with the presence of weight-related diagnosis codes

6-month preoperative period

Compared to patients with claims-based weight-related diagnosis codes, those without codes were more likely to be male, Asian, older, have more hospital stays before operation, or receive the operation in an ambulatory care setting in Cohort 1 (Table 2). Among patients who had weight-related diagnosis codes, those with granular codes (e.g., V85.30) were more likely to have SG, be covered by Medicare Advantage plans, or have the operation in an inpatient setting or recent years (Additional file 1 eTable 3).

Table 2 Determinants of missing weight-related diagnosis codes during the 6-month preoperative period, 2011–2018 (Cohort 1)

1-year postoperative period

Compared to patients with claims-based weight-related diagnosis codes, those without codes were more likely to receive AGB, be younger, be male, be commercially insured, or lack preoperative weight-related diagnosis codes in Cohort 1 (Additional file 1 eTable 4). Among patients who had weight-related diagnosis codes in the postoperative year, those having granular codes were more likely to be older, be covered by Medicare Advantage plans, have comorbid conditions, receive SG, or have the operation in an inpatient setting or recent years (Additional file 1 eTable 5).

Performance of the claims-based algorithms

6-month preoperative period

In Cohort 2, the severe obesity classification algorithm (i.e., presence of BMI ≥35 kg/m2) in the 6-month preoperative period had a sensitivity of 100%, a specificity of 71%, a PPV of 100%, and an NPV of 78% (Additional file 1 eTable 6). When classifying the last available preoperative weight-related diagnosis code into 10 levels, the BMI categorization algorithm had a weighted kappa of 0.78 (95% confidence interval 0.76, 0.79). The specificity and NPV were high for all BMI levels; The sensitivity and PPV were above 60% for most BMI levels over 35 kg/m2 (e.g., BMI 35.0–39.9, sensitivity 64%, specificity 97%, PPV 81%, NPV 93%; 40.0–44.9, sensitivity 76%, specificity 87%, PPV 71%, NPV 90%) and lowest for BMI between 30.0 and 34.9 kg/m2 (sensitivity 30%) (Table 3).

Table 3 Validation results for the BMI categorization algorithm in the 6-month preoperative (Cohort 2) and 1-year postoperative periods (Cohort 3)

1-year postoperative period

In Cohort 3, the BMI categorization algorithm had a weighted kappa of 0.84 (95% confidence interval 0.80, 0.87). The specificity and NPV were high for all BMI levels while the sensitivity was above 70% and the PPV was above 60% for most BMI levels (Table 3).

Sensitivity analyses

When varying the severe obesity classification algorithm to detect the presence of BMI ≥40 kg/m2 during the 6-month preoperative period, both the specificity and NPV increased (75 and 83%, respectively) while sensitivity and PPV dropped slightly (98 and 96%, respectively). Expanding the algorithms to include nonspecific weight-related diagnosis codes (e.g., 278.01) resulted in meaningful decrease in specificity (Additional file 1 eTable 6).

The 5-level BMI categorization algorithm had similar concordance compared to the 10-level categorization, while the 4-level BMI categorization algorithm had great concordance with a weighted kappa above 0.90 for both the preoperative and postoperative periods (Table 3). Expanding the algorithms to include nonspecific weight-related diagnosis codes had minimal impact on their performance (Additional file 1 eTable 7). Relaxing the proximity requirement between the timing of the claims-based weight-related diagnosis codes and the EHR-based BMI measurements increased the size of the validation sample; this did not change their concordance during the 6-month preoperative period but reduced their concordance in the 1-year postoperative period (Additional file 1 eFigure 4). The BMI categorization algorithm for the last available BMI performed well in the 6-month and 2-year postoperative periods (Additional file 1 eTable 8).

Discussion

In a large administrative claims database, we found that nearly all bariatric surgery patients had preoperative weight-related diagnosis codes, while the presence of granular weight-related diagnosis codes increased substantially in both the preoperative and postoperative periods between 2011 and 2018. The claim-based algorithm for severe obesity, which classified patients as having severe obesity if they had a diagnosis code indicating BMI ≥35 kg/m2, had high sensitivity and PPV but reasonable specificity and NPV. The BMI categorization algorithm that categorized weight-related diagnosis codes into BMI levels had excellent concordance with the EHR-based BMI measurement, with high specificity, PPV, and NPV across all levels and higher sensitivity among higher levels of BMI.

The persistently high prevalence of claims-based weight-related diagnosis codes, including granular and nonspecific codes, in the preoperative period across the study years reflects the high adherence to the insurance reimbursement requirement [11,12,13]. The observed higher prevalence of weight-related diagnosis codes in the ICD-10-CM era than the ICD-9-CM era is consistent with previous data that focused on the claim-based diagnosis codes in the general population [10].

The BMI categorization algorithm had different sensitivities for BMI level 30.0–34.9 kg/m2 in the preoperative and postoperative periods (30% versus 84%). Six months before having a bariatric operation, 70% of patients with an EHR-based BMI measurement between 30.0 and 34.9 kg/m2 had a granular weight-related diagnosis code indicating BMI ≥35 kg/m2. During the first postoperative year, only 15% of those with an BMI measurement between 30.0 and 34.9 kg/m2 had a diagnosis code indicating BMI ≥35 kg/m2. These patients with borderline BMI levels immediately before having a bariatric operation might have undergone preoperative weight loss as required by their insurance or encouraged by their clinical programs, as half of them had 1 or more BMI measurements ≥35 kg/m2 within the prior 30 days. These patients might also have been up-coded with a higher weight-related diagnosis code to meet the prior authorization requirement.

Claims databases for bariatric Surgery research: a glass half-full of half-empty?

The high prevalence and validity of weight-related diagnosis codes before a bariatric operation in claims databases makes it feasible to use these codes to capture a large proportion of eligible patients, especially when researchers impose additional eligibility criteria to exclude patients with non-obesity indications, like what we did in our study. In addition, the high concordance between the claims-based BMI categorization algorithm and actual BMI measurement, along with its high validity, suggests that it is possible to use these preoperative weight-related diagnosis codes for baseline confounding control.

On the other hand, despite considerable increase across years and high validity, the presence of weight-related diagnosis codes remained low in the first postoperative year, with around 80% of patients having any codes and around 60% having granular codes in 2017 and 2018. The suboptimal presence of weight-related diagnosis codes in the postoperative period makes it more challenging to use claims databases for weight-related effectiveness research. In addition, there could be differential coding in the postoperative period because patients with granular weight-related diagnosis codes were older and had more comorbid conditions (Additional file 1 eTable 5). These patients with granular diagnosis codes in the postoperative period may not be representative of the overall study population. For example, some of them may be preparing for a second stage operation or having inadequate weight loss from their index operation. It is thus important to weigh the internal validity and generalizability when using the postoperative weight-related diagnosis codes for weight-related effectiveness outcome research. In situations when all relevant factors contributing to the presence of postoperative granular diagnosis codes are measured, results from patients with granular codes could be generalized to the overall study population using appropriate statistical approaches, such as inverse probability weighting [18]. Taken together, our findings support the use of administrative claims data for bariatric surgery research of non-weight-related outcomes that are generally well-captured, such as rehospitalization, reoperation, venous thromboembolism, or remission of certain comorbidities including type 2 diabetes [19,20,21,22].

Strengths and limitations

This study used contemporary data from a large administrative claim database linked with EHR to validate two claims-based weight-related algorithms. Prior studies focused on either claims-based algorithms in the general population [8, 10] or the broad four-level obesity classification algorithm for bariatric surgery patients in the preoperative period [23]. We evaluated the validity of these diagnosis codes during both the preoperative and postoperative periods, providing information for researchers who are interested in using administrative claims databases to study weight-related effectiveness outcomes. Our findings add to the knowledge base of the quality and suitability of administrative claims data, a real-world data source, for generation of real-world evidence in bariatric surgery research [24].

One limitation of our study is the small sample size for the postoperative period resulted from the proximity requirement on the EHR-based BMI measurement, which may limit the generalizability of our results. In sensitivity analyses where we relaxed the proximity requirement, the size of the validation sample increased but no substantial change was observed in the validity of postoperative weight-related diagnosis codes. Moreover, the linked EHR data were only available on a small subset of patients identified in claims who received care at healthcare service systems that contribute EHR data to OLDW, raising the possibility of unmeasured factors affecting our analyses and limiting the generalizability of our results.

Conclusions

Among bariatric surgery patients identified within administrative claims databases, the validity of weight-related diagnosis codes was excellent during the preoperative and postoperative periods. These findings support the use of administrative claims databases for bariatric surgery research in the absence of BMI measurements for non-weight-related effectiveness and safety outcomes that are generally well-captured in these databases. However, the availability of weight-related diagnosis codes was suboptimal during the postoperative period, making it more challenging to use claims databases for weight-related effectiveness research.