Background

Systemic light chain (AL) amyloidosis is a rare plasma cell disorder characterized by extracellular tissue deposition of misfolded and aggregated amyloid fibrils derived from clonal immunoglobulin-free light chains [1]. The majority of patients with AL amyloidosis present with multisystemic involvement, where heart, kidneys, gastrointestinal tract, nervous system, and musculoskeletal system often affected [2]. The disease often entails distinct and disparate symptoms across different organ systems over time leading to healthcare visits with multiple specialists [3]. Existing literature suggests that more than one-third of patients report symptoms for a year or longer and approximately half sought four or more different physicians before their AL amyloidosis diagnosis was formally established [3]. Literature and clinical experience suggests a substantial delay in diagnosis from initial onset of symptoms [3,4,5,6,7]. The delay in AL amyloidosis diagnosis leads to many patients being diagnosed with advanced organ involvement, often associated with poor prognosis [1].

Reducing the time from the onset of precursor conditions’ symptom to the diagnosis for AL amyloidosis is a critical unmet need. Taken together, the low incidence of AL amyloidosis, the non-specificity of its presenting symptoms, and the resulting reliance on numerous different healthcare providers to address them make disease diagnosis a complex task. For these reasons, disease awareness and a high suspicion by the provider physician are key elements to making a diagnosis of AL amyloidosis. In this paper, we contribute toward that goal by examining the incidence, timing, and co-occurrence specific conditions likely to be symptoms related to AL amyloidosis (we refer them as precursor diagnoses). Specifically, we leverage a large and diverse electronic health record (EHR) dataset to describe the timing of nineteen specific clinical precursor diagnoses and their co-occurrence within the three years prior to the patients’ diagnosis of AL amyloidosis.

Methods

Data source

Data for this observational retrospective cohort study of patients diagnosed with AL amyloidosis were drawn from TriNetX. The TriNetX is a health research network providing access to high-quality, de-identified patient-level EHR data from more than 60 U.S. healthcare organizations. The data are deidentified in accordance with the HIPAA de-identifications standards at 45 CFR § 164.514. Data were provided by the Medical College of Wisconsin Clinical Research Data Warehouse of the Clinical and Translational Science Institute using Honest Broker tools which allow extraction of curated data in fully de-identified format and therefore do not require Institutional Research Board review.

Cohort identification

As a first step, we selected a cohort of individuals with AL amyloidosis by identifying patients who had at least one inpatient or at least two outpatient visits with an association AL amyloidosis International Classification of Diseases (ICD) diagnosis code (ICD-9: 277.30, 277.39 or ICD-10: E85.81, E85.4, E85.89, E85.9) during the study period between 10/01/2015–12/31/2020. The date of the earliest of these occurrences was used to indicate the time of the patients’ formal AL amyloidosis diagnosis. In order to make the cohort specific to individuals with AL amyloidosis who might have been picked by the amyloidosis unspecified ICD9 code E85.9, we further required patients to have received chemotherapy or autologous BMT within -90 to +365 days of the AL amyloidosis diagnosis for the study. We restricted the sample to individuals for whom there was information on healthcare utilization in TriNetX dating back to at least 3 years from their AL amyloidosis diagnosis date. Our choice of look back period was consistent with the prior literature indicating that nearly 90% of AL amyloidosis patients report initial symptoms within 3 years and 80% within 2 years of their AL amyloidosis diagnosis [3].

Variable definitions

The precursor conditions of interest, along with their ICD codes, are listed in Table 1. These were categorized by organ system as clonal, cardiac, renal, gastrointestinal, multisystemic, and neurologic, with a residual category of miscellaneous. Time from precursor condition to AL diagnosis was calculated based on the date of the earliest medical encounter with a code for the specific precursor diagnosis to the date of AL amyloidosis diagnosis.

Table 1 ICD codes for the precursor diagnoses of interest.

Statistical analysis

The onset of AL amyloidosis diagnosis was considered as time 0 and the EHR period preceding time 0 was shown 6 months prior, 12 months prior, 24 months prior, and 36 months prior to time 0. By cohort definition, all patients had a minimum of 36 months of EHR data prior to time 0.

The first period, 36 months to 24 months prior to time 0 was considered as the prevalence period to determine the baseline prevalence of the precursor diagnoses within the cohort assuming that it may not be due to AL amyloidosis. Starting 24 months prior to time 0, the new appearance of precursor diagnoses was considered as the incidence. For each precursor diagnosis, the incidence was calculated by determining the proportion of patients who had a new occurrence of the specific precursor diagnosis code starting 24 months prior to the AL amyloidosis diagnosis code to the first occurrence of AL diagnosis. The median time between the first instance of each precursor diagnosis to time 0 was estimated via a kernel-density estimation and using a scaled probit transformation to account for the boundary restrictions at -36 months and time 0.

The pairwise co-occurrence between precursor diagnoses was calculated using the Cole’s coefficient (CC) [8]. This coefficient measures the degree to which the observed proportion of joint occurrences exceeds or falls short of the proportion of joint occurrences expected by chance alone [9]. It equals 0 when there is no association between the events, achieves the value of +1 when one event is a subset of the other. Given the exploratory nature of our analysis, correlations greater than 0.6 were considered strong. Fisher’s exact test was used to assess the statistical significance of the co-occurrence.

Results

There were 1401 patients with AL amyloidosis with at least 3 years of EHR history available. The median age of the cohort was 71 (range 21–91) years, with 646 (46%) of the patients being females, 877 (63%) non-Hispanic White, 79 (5.6%) Hispanic, 225 (16%) non-Hispanic Black, 17 (1.2%) non-Hispanic other, and 203 (14%) unknown race and/or ethnicity by self-report. The median length of time with medical history available in EHR was 5.7 years (range, 3.0–10 years). A median of 4 [Interquartile range (IQR) = 2–5] organ systems were identified among precursor diagnoses codes prior to AL amyloidosis diagnosis, with a median of 5 (IQR = 2–9) specific precursor diagnoses. The number of diagnosed precursor conditions increased from an average of two at 24 months prior to the AL amyloidosis diagnosis to three at 12 months to four at six months prior to the identification of AL amyloidosis, Table 2.

Table 2 Cumulative number of precursor diagnoses over time prior to AL amyloidosis diagnosis.

The most prevalent precursor diagnoses observed as a diagnostic code at AL amyloidosis diagnosis were as follows; dyspnea (55.3%), fatigue (44.7%), neuropathy (39.9%), chronic kidney disease (37.5%), altered bowels (37.0%), edema (36.5%), and heart failure (36.3%). In contrast, less than 3% of the patients had an incidence of a diagnostic code for purpura (2.4%), nephritic syndrome (2.3%), nephrotic syndrome (2.3%), autonomic neuropathy (0.9%), and macroglossia (0.2%), Table 3. The median time to AL amyloidosis diagnosis from the earliest precursor condition ranged from 3.2–21.4 months.

Table 3 Proportion of patients with precursor diagnoses based on time before AL amyloidosis diagnosis.

As illustrated in Fig. 1, the incidence of precursor conditions within 2 years of AL amyloidosis diagnosis varied between 0.14% (macroglossia) to 34% (dyspnea). The estimated probability density of each precursor diagnosis onset by time before AL amyloidosis diagnosis are provided in the supplemental figures showing the density in the entire 24 month prior (Supplementary Fig. 1) and preceding 6 months prior to AL amyloidosis diagnosis when the appearance of new precursor diagnosis is the greatest (Supplementary Fig. 2).

Fig. 1: Development of precursor diagnoses prior to AL amyloidosis diagnosis.
figure 1

Cumulative incidence of diagnoses among patients with 3+ years EHR history during the incidence period, with dotted lines showing pointwise 95% confidence limits. The dashed vertical line shows the median diagnosis time during this period with 95% confidence interval shown by shading. The percentage of patients with a diagnosis by the start and end of the incidence period as shown.

In terms of co-occurrence of symptoms, the strongest positive correlations were observed for the co-occurrence of dysphagia and macroglossia (CC = 0.86, p value = 0.002), autonomic neuropathy and dyspnea (CC = 0.76, p value = 0.009), heart failure and cardiomyopathy (CC = 0.74, p value < 0.001), renal disease and nephrotic syndrome (CC = 0.74, p value < 0.001). Other interesting associations included purpura and neuropathy (CC = 0.68, p value < 0.001), fatigue and autonomic neuropathy (CC = 0.68, p value = 0.004), purpura and fatigue (CC = 0.65, p value < 0.001), syncope and autonomic neuropathy (CC = 0.62, p value < 0.001), and altered bowels and autonomic neuropathy (CC = 0.60, p 0.006). Table 4 lists all precursor diagnoses and organ systems co-occurring with a statistically significant and ≥ 0.6 correlation co-efficient. Figure 2 shows the pairwise associations among all the tested precursor diagnoses.

Table 4 Cole’s co-efficient values for co-occurrence between precursor diagnoses.
Fig. 2: Co-occurrence of precursor diagnoses.
figure 2

The extent of co-occurrence of diagnoses was quantified using Cole’s coefficient. It equals 0 when there is no association between the events, achieves the value of +1 when one event is a subset of the other, and a value of −1 if the events never co-occur. Fisher’s exact test was used to assess the statistical significance of the co-occurrence.

Discussion

The diagnosis of AL amyloidosis is often delayed despite patients reporting multiple symptoms and seeing different specialists for care over several months to years. This context underscores the critical unmet need for reducing the time from the initial onset of symptoms to the diagnosis of the disease. To address this need, we leveraged a large EHR dataset to investigate the timing and co-occurrence of specific precursor diagnoses occurring before the diagnosis of AL amyloidosis. We were particularly interested to understand when and how certain precursor diagnoses such as dyspnea, fatigue, edema, pain, proteinuria, among others were established as diagnoses within the medical history as a diagnostic code in relation to the AL amyloidosis diagnosis. These precursor diagnoses were derived from symptoms and signs endorsed by many AL amyloidosis patients [4, 10]. We were interested in understanding whether these get catalogued as diagnoses within medical history by ICD codes and if so, how early before the diagnosis of AL amyloidosis. By identifying the proportion of patients with these precursor diagnoses and examining their timing and co-occurrence, our study sheds light on the diagnostic process using EHR data in this rare multisystemic condition.

Our prior work suggests that AL amyloidosis patients have a high prevalence of precursor diagnoses [6]. In the current analyses, we studied the pattern and timing of these diagnoses prior to the diagnosis of AL amyloidosis using the same data source. Concordant with findings by others [7] and as reported in patient surveys [3], our analysis confirms the high prevalence of several symptoms of the disease present and diagnosed in EHR seen in this disease well before the diagnosis of AL amyloidosis has been made with a median time of 3.2 to 21.4 months before AL amyloidosis diagnosis, providing support to efforts to develop predictive algorithms toward early diagnosis.

The most common precursor diagnoses include dyspnea, fatigue, heart failure, edema, altered bowels, neuropathy, and chronic kidney disease. Fatigue is the most common symptom of the disease as reported by 80% of AL amyloidosis patients [4]. Our data show that fatigue is also the most catalogued of the precursor diagnoses as an ICD code, seen in 45% of patients with median time 15.6 months before the diagnosis of AL amyloidosis. Other common AL amyloidosis symptoms including dyspnea and edema are also commonly identified as ICD codes by healthcare providers at one year or longer prior to the diagnosis of AL amyloidosis. This concordance with known symptoms of the disease document the feasibility to using EHR data of diagnosis codes toward creating algorithms that could improve time from symptom onset to AL amyloidosis diagnosis.

When assessing co-occurrence of precursor diagnoses, the strongest correlation was often seen with precursor diagnoses within the same organ system/category, for e.g., cardiomyopathy and heart failure, or nephrotic syndrome and renal disease. This was concordant with expected AL amyloidosis pathology, in that, with organ involvement, the disease would be expected to cause multiple symptoms and signs related to that organ system. Other precursor diagnoses belonged to different organ systems/categories but made intuitive sense as a downstream effect of one of the precursors e.g., autonomic neuropathy and syncope or macroglossia and dysphagia.

The majority of AL amyloidosis patients have more than one organ involvement, thus correlations between organ systems were of greater interest in our analysis. Here we saw numerous strong correlations across various organ system categories, autonomic neuropathy and dyspnea, neuropathy and purpura, fatigue and purpura, fatigue and autonomic neuropathy. The two organ systems which showed the greatest correlation included cardiac and gastrointestinal, cardiac and multisystemic, gastrointestinal and other, neurologic and multisystemic, and neurologic with other. Some other ICD code-based precursors which we had not considered in our current analysis could be added in future iterations include weight loss, constipation, elevated alkaline phosphatase, alopecia, nail dystrophy, spontaneous ecchymosis, and nail dystrophy.

It is crucial to acknowledge and consider the limitations of our study, inherent to the use of EHR data in research, when interpreting the results. For example, misdiagnoses, coding errors, and variations in which symptoms are recorded can lead to incorrect associations or missed precursor diagnoses. Different healthcare organizations within the TriNetX network may have variations in EHR systems and diagnostic coding practices, thus limiting the generalizability of the findings. The early symptoms of AL amyloidosis are nonspecific and can mimic other, more common, conditions. Our study assumes that precursor diagnoses represent early symptoms of AL amyloidosis, but we have not adjusted for comorbidities that may drive the onset of many of the precursor diagnoses. For example, presence of diabetes may lead to neuropathy, proteinuria, and cardiomyopathy. Our approach in selecting a cohort with at least 3 years of backward medical history allows us to assess a baseline prevalence period in the first year of EHR history and then incidence in the subsequent two years preceding the AL amyloidosis diagnosis. On the other hand, our requirement for the 3-year medical history would exclude patients who established within a medical system with the constellation of amyloidosis symptoms within a short time frame. Presumably a proportion of these are patients suspected of having AL from a smaller practice (where there may have been diagnostic delays) and are referred to a larger center for expeditious workup. Because the goal of this study was to establish the timing of precursor diagnoses, we had to start with a cohort with ‘established patients’ within a medical system with ongoing interactions with it, but then happen to develop AL. Lastly, we lack detailed clinical context making it challenging to understand the severity and clinical significance of precursor diagnoses. Nevertheless, our approach is relevant because it is reflective of symptoms that are clinically recognized by healthcare providers. An important strength of this analysis is the racial diversity within our dataset, often lacking in published clinical research in AL amyloidosis from the U.S [11].

In conclusion, leveraging real time EHR data enabled us to identify a large and diverse cohort of AL amyloidosis patients from which to examine diagnostic patterns and demonstrate the potential for earlier diagnosis of this complex disease. Our findings lay the foundation to develop clinical algorithms using ICD codes aimed at earlier recognition of AL amyloidosis.