Introduction

Among various forms of dementia, Alzheimer’s disease is considered a particularly debilitating neurodegenerative disease that has had a significant social impact [22]. Development of treatments for AD, drug or otherwise, is coming along but has been slow [12], in large part due to the limited understanding of the pathophysiology of the disease. While there are clear indicators of AD pathology such as amyloid plaque buildup and neurofibrillary tangles, the functional roles of such disease indicators and their contribution to the pathology are still unclear [6]. As an alternative to de novo drug development, there have more recently been considerations to investigate previously approved drug candidates that can potentially be repurposed for the treatment or prevention of AD [5, 9, 13, 17, 25], taking advantage of the reduced time and cost associated with drug repurposing. Over 50 repurposed drugs are already being tested in clinical trials [2], and constantly more potential candidates are emerging from ongoing research.

Medical records, such as insurance and health records, can be likened to a treasure trove of clinical data, with the capability to provide statistical insights on drug use, disease incidence, medical costs, patient demographics, admission rates, and more [14, 26]. The plethora and variety of data can be joined together to provide a correlative analysis which, while perhaps not conclusive, can provide some directions on potential avenues of research. One such application is considering drug-disease relationships within patients for the purpose of identifying or supporting drug-repurposing targets [9, 33]. In this study, we investigate a compilation of insurance records from a US commercial insurance group, with coverage of the group’s members dating back to 2012. The database thus far has been used in analyses such as drug cost comparisons between hospitals and physician offices [24], cost-saving differences between cancer screening methods [23], trends in drug prices [32], opioid prescription trends during the opioid crisis [35], and relationships between COVID and pre-existing conditions [31]. Thus far, the database has yet to be used for observing correlative drug-disease relationships.

In this study, we report the correlations observed between Alzheimer’s disease and drug prescriptions from the health records of the insurance database. Insurance claims data offers the ability to mine for associations in massive databases spanning millions of individuals. Here, we measured drug associations with two disease metrics: (1) AD incidence rate and time to diagnosis and (2) frequency of AD claims as a proxy for disease severity. The analyses conducted here may reveal previously unidentified trends between drug prescriptions and AD diagnosis within a commercially insured population. It may also serve as a reference to those investigating potential repurposing candidates and/or provide additional support for previously identified candidates for the treatment and/or prevention of Alzheimer’s disease.

Methods and Materials

All member and claim data for the study were accessed through Blue Cross Blue Shield Axis ®, the largest collection of secure commercial claims, medical professional, and cost of care information. The limited dataset of claims data is derived from the independent, locally operated Blue Cross Blue Shield companies across the USA. The database comprises over 400 million claims compiled from over 9 years worth of data (approximately from 2012 to 2021). The derived records are considered primary data sources that include member demographics, claims made during hospital or doctor visits, and pharmaceutical claims for drug prescriptions.

For the purposes of this study, we defined the incidence of a disease based on the ICD-9 code [30] listed as the primary diagnosis within each medical claim. For AD, we used the ICD-9 code 331.0. Prescription drug information within the database was defined according to the National Drug Code (NDC) format. We mapped the NDC IDs to their active ingredients via RxNorm [19]. Thus in this study, a drug was defined as one of the RxNorm (active ingredient) codes mapped in this way. Drug users were defined as members that made at least one drug prescription claim mapped in this way, while non-users were members that never made a claim for the drug in their available coverage history.

Two primary outcomes were considered: (1) AD incidence rate and time to diagnosis and (2) healthcare utilization related to the disease represented by the number of AD claims. Both analyses were done on a per-drug basis. In summary, for each drug evaluated, members were grouped into users and non-users (Supplementary Methods Figure S1, S3); drug users were propensity matched to those in the non-user group based on age, gender, unique drug usage, and presence of common diseases (via ICD9) (Supplementary Methods Figure S2); and statistical associations between drug use and AD outcomes were conducted. The 200 most common drugs taken by members in the AD group were chosen for the two analyses; additionally, repurposing candidates that were in clinical trials as of 2020 [2] were separately analyzed for AD incidence rate. An overview of the incidence and claims count analyses are summarized in Fig. 1. More detailed explanations for each analysis are provided in the Supplementary Methods.

Fig. 1
figure 1

Overview of the statistical analysis methods used in this study

Inflation of test statistics was controlled using a median quantile adjustment similar to genomic control [7] for the primary analyses. This was achieved by determining the inflation factor (the median observed test statistic relative to theoretical value), scaling test statistics with the inflation factor, and recalculating p-values. Afterwards, multiple testing correction was conducted by using the Benjamini/Hochberg method [11].

All analyses were conducted within a virtual, computational environment managed by the Blue Cross Blue Shield Association to ensure proper privacy protection of the sensitive patient data. All modeling and statistical measurements were performed using the statsmodels package within python version 0.10.0 [27]. The IRB protocol number for this project is IRB-19–7372.

Results

General statistics and inclusion criteria

The entirety of claims data available spans from 2012 up until the end of 2020 and contains approximately: 113 million distinct members, 751 million inpatient medical claims, 5.2 billion outpatient medical claims, and 3.9 billion pharmaceutical-related claims. There were a total of 143,761 members that had made at least one claim for Alzheimer’s disease (ICD9 code 330.0). Within these members, 92,323 were female and 51,438 were male, and the mean and median age were 86 and 88, respectively. The filters we used in the incidence analysis are described in the following. First is our consideration of age. Alzheimer’s disease disproportionately affects the elderly population that is on average 65 years or older [22]. To improve efficiency in matching and reduce bias from having a large population of younger individuals, an over-encompassing filter was applied where only members that were of age 65 or older at the midpoint of the coverage range (2016) were considered (IE over 60 at 2012, or over 70 at 2021).

Next, we address our decision to use only the individuals with full coverage history. Our concern is the increased matching of pairs with insufficient disease profiles due to a lack of data from a short coverage period. For example, individuals with only a few months of coverage would be matched simply because they made few to no claims during that short time period. It was in our interest to reduce such variability in the matches as much as possible; as a result, we opted to include a filter for the full coverage range of 9 years.

In summary, we only included members that had (1) full coverage history within the database (2012 to 2020), (2) BCBS as their primary provider, and (3) at age 70 + in 2021. These filters limited the number of members with AD to 835. These filters were also applied to members without AD for the analysis and resulted in 101,084 non-AD members.

Within the claim count analysis, the conditions were relaxed since it was already known that all members have made at least one claim for AD. Instead, we opted to find a balance between members having a sufficient amount of coverage information prior to and after AD diagnosis, without reducing the pool of members too much. As a result, we chose members that had at least 9 months of coverage both before and after their AD diagnosis, in which 74,153 members (approximately half of the entire AD cohort) were eligible.

Alzheimer’s disease incidence rate and survival analysis depending on drug use

Survival analysis was conducted on propensity-matched pairs for the top 200 drugs taken by AD patients based on the BCBS database records. The log hazard ratio distribution and the hazard ratio log cumulative graph are shown in Fig. 2. Figure 3 displays the QQ plots for raw and adjusted p-values from the survival analysis on the negative log 10 scale. The median hazard ratio for all drugs was 0.95. The inflation factor for the adjusted p value was approximately 2.46. We identified 22 drugs having an association with AD incidence with adjusted p < 0.05 in our survival analysis, 15 with a decreased risk, and 7 with an increased risk (Table 1).

Fig. 2
figure 2

Distribution graph (left) and cumulative graph (right) of the log hazard ratio. Median hazard ratio is 0.95

Fig. 3
figure 3

QQ Plots for raw and adjusted P values

Table 1 Drugs where the inflation-adjusted P value was less than 0.05

Out of 58 repurposed drug candidates that are in clinical trials for the treatment of AD as of 2020 [2], we found 25 drugs where (1) prescriptions for the drug were present with members that had made claims for AD within the BCBS database, providing evidence of use within the member cohort, and (2) drug users were able to be matched to nonusers during propensity matching. The specific candidates were separately compared from the general analyses of the 200 drugs. P-value adjustment and correction were not performed due to the lower number of candidates. Table 2 shows the most significant candidates (p < 0.05) that were present in the drug list, as well as their survival analysis statistics. A list of all the drugs that were tested can be found in the Supplementary tables. Overall, only 8 of the repurposed drug candidates were shown to have any significant association with AD incidence within the BCBS database.

Table 2 Drugs that were found to be significant (raw p < 0.05) out of clinical trial candidates available for survival analysis

Healthcare utilization related to AD as represented through claim count analysis within the AD-specific population

We next examined the association between drug prescription claims and the number of AD claims. According to a number of studies, insurance claim count data can be considered a proxy for disease severity [16, 20, 28]. Although differences in AD-related claim counts between drug users and non-users may simply reflect differences in health care utilization, we also believe that strong associations may indicate a drug’s potential influence on the progression of AD.

Propensity-matched drug users and non-users were compared for the top 200 drugs taken within members that had made at least one claim for Alzheimer’s disease. The total number of Alzheimer’s disease-related claims made within the 9 months after the first claim was tallied for the member pairs, and the paired t-test statistic and p value significance were calculated. Figure 4 shows the QQ plots of p value significance (raw, adjusted, and corrected) for the 200 drugs in the analysis. Table 3 portrays the drugs considered significant based on adjusted p-value (adjusted p < 0.05), indicating that drug users and non-drug users have a non-trivial difference in the number of AD claims made. Overall, this consists of 9 negative and 4 positive paired t-test statistic results that were deemed significant. All p-values calculated are provided in the table for reference; only one drug (quetiapine) was considered significant when observing the corrected p-value.

Fig. 4
figure 4

QQ plots for raw and adjusted p-values from the claim count analysis

Table 3 Drugs with claim count differences where adjusted P value is lower than 0.05

Discussion

The current study is an exploratory analysis that identifies associations between drug treatment and Alzheimer’s disease in a large insurance claims database. We found that antibiotics, antiviral, and anti-inflammatory drugs had low hazard ratios in our study. Notably, very significant drugs with lower hazard ratios consist of common antibiotics and anti-inflammatory drugs, particularly for those where the corrected P is less than 0.05. A number of ongoing studies have suggested that inflammatory response in the brain is one of the key factors that lead to AD [1, 21]. The observation that taking anti-inflammatory medication results in a lower incidence of AD can be supportive of this association. Likewise, microbial and viral infections have also been associated with a higher risk of AD [4, 13]. The lower hazard ratio from the use of antibiotics and antiviral medication, therefore, could be indicative of a possible protective effect leading to reduced incidence of AD overall. Overall, the results may warrant further investigation into these drug categories to more clearly elucidate if they can have a preventive influence for Alzheimer’s disease.

Our analysis also shows drugs that treat mental illness have high hazard ratios and more claims for drug users. The significant drugs seen with high hazard ratios are those used to treat AD itself or other mental illnesses. This is also seen with the repurposed candidates in clinical trials, where the higher HRs trend towards drugs that are typically used to treat neurological illnesses, as well as the claim counts, in which the significant drugs with a positive paired t-test statistic are also related to the treatment of mental illnesses. It is key to note that these trends are unlikely due to the effect of the drug use, but rather that the diseases themselves were not considered in the feature selection during propensity matching. It is particularly notable that the drugs donepezil and memantine, which are primarily used in the treatment for AD, are prescribed prior to the diagnosis of AD within the drug user group. One likely reason for this case is that these drugs are being prescribed for MCI and other forms of cognitive decline prior to the onset and eventual diagnosis of Alzheimer’s disease itself, although studies have shown that the efficacy of such drugs is modest at best [29, 34]. Other medications provide more intriguing insights into the connections between AD and a variety of other mental illnesses such as dementia, schizophrenia, anxiety, and depression. Relationships between these conditions and Alzheimer’s have been seen before, where a number of these conditions result in a higher risk of AD [8, 15, 18]. Furthermore, misdiagnosis for the early stages of AD as a different disorder may be a cause of the associations seen here,misdiagnosis is prevalent for Alzheimer’s disease [10], which could be reflected in the candidates with higher hazard ratios seen in the study.

There are a number of important limitations to this study. First, the misdiagnosis and underdiagnosis of AD could affect the analysis results. Our analysis relied on a single ICD-9 code (331.0) to define AD-positive individuals. While including ICD-9 codes for other dementias could reduce the effect of underdiagnosis, our analysis conservatively defined AD positivity to avoid reporting biased results.

Second, our propensity score model had a finite limit on the number of patient features that could be used in our matching procedure. The model was primarily based on demographic data and individual disease status (also derived from the insurance claims data), which captured a broad cross-section of features for propensity matching. While the addition of more features could undoubtedly improve matching, our analysis was limited by practical considerations of compute requirements and generalizability.

Next, the filters that were applied for the incidence analysis merit discussion. Most notably, full coverage filtering introduces the possibility of survival bias, where only individuals that have “survived” (in this case, still insured by the current group). However, using partial instead of full coverage filters resulted in high inflation of the test statistic and overall predisposition of drugs towards lower hazard ratios (see all Supplementary Results figures). Hence, for both consistency and reliability, our analysis considers individuals with the full coverage range, which reduces the number of poorly matched pairs with incomplete coverage data. Consequently, the average life expectancy after AD diagnosis is approximately 8.3 years [3]; therefore, it is reassuring that the range observed would expect to cover many of the patients that end up being diagnosed within the specified time frame and likely reduce the influence of survival bias.

Finally, there are limitations originating directly from the dataset itself. These include, but are not limited to, (1) prevalence of younger and working demographic population for commercially insured members may bias the cohort to a more healthy deposition; (2) healthcare utilization could simply be a result in better use of the healthcare system rather than severity; and (3) drug prescriptions do not guarantee the actual use of the drug by the patient, or vice versa where the patient may use a particular drug without making a prescription claim.

In conclusion, the observational study described by this manuscript can be considered as a compilation of the associations between the use of commonly prescribed drugs and Alzheimer’s disease within an insured population. Most notable is how observations made in this study can be confirmed with other experimental studies. While the mechanisms behind neuroinflammation leading to AD had been established in experimental settings and are still an active field of study, and the comorbidities of AD with other mental illnesses had been observed before in clinical settings, it is interesting to observe that the trends carry over in an observational study of a commercially insured population. In both cases, where antibiotics were associated with a lower incidence of AD and mental illness drugs associated with a higher incidence of AD, it was both unexpected (assuming the null hypothesis) and reassuring that there is mechanistic reasoning behind the results. In future studies, we hope to further explore these drug-disease associations through secondary datasets and mechanistic validation. Overall, the results from the analysis are provided with the hopes of providing direction and furthering progress on the complex task of understanding, treating, and preventing Alzheimer’s disease.