How biomarker patterns can be utilized to identify individuals with a high disease burden: a bioinformatics approach towards predictive, preventive, and personalized (3P) medicine

Bertele, Nina; Karabatsiakis, Alexander; Buss, Claudia; Talmon, Anat

doi:10.1007/s13167-021-00255-0

How biomarker patterns can be utilized to identify individuals with a high disease burden: a bioinformatics approach towards predictive, preventive, and personalized (3P) medicine

Research
Open access
Published: 29 September 2021

Volume 12, pages 507–516, (2021)
Cite this article

Download PDF

You have full access to this open access article

EPMA Journal Aims and scope Submit manuscript

How biomarker patterns can be utilized to identify individuals with a high disease burden: a bioinformatics approach towards predictive, preventive, and personalized (3P) medicine

Download PDF

3438 Accesses
Explore all metrics

This article has been updated

Abstract

Prevalences of non-communicable diseases such as depression and a range of somatic diseases are continuously increasing requiring simple and inexpensive ways to identify high-risk individuals to target with predictive and preventive approaches. Using k-mean cluster analytics, in study 1, we identified biochemical clusters (based on C-reactive protein, interleukin-6, fibrinogen, cortisol, and creatinine) and examined their link to diseases. Analyses were conducted in a US American sample (from the Midlife in the US study, N = 1234) and validated in a Japanese sample (from the Midlife in Japan study, N = 378). In study 2, we investigated the link of the biochemical clusters from study 1 to childhood maltreatment (CM). The three identified biochemical clusters included one cluster (with high inflammatory signaling and low cortisol and creatinine concentrations) indicating the highest disease burden. This high-risk cluster also reported the highest CM exposure. The current study demonstrates how biomarkers can be utilized to identify individuals with a high disease burden and thus, may help to target these high-risk individuals with tailored prevention/intervention, towards personalized medicine. Furthermore, our findings raise the question whether the found biochemical clusters have predictive character, as a tool to identify high-risk individuals enabling targeted prevention. The finding that CM was mostly prevalent in the high-risk cluster provides first hints that the clusters could indeed have predictive character and highlight CM as a central disease susceptibility factor and possibly as a leverage point for disease prevention/intervention.

Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank

Article Open access 03 February 2023

Characterisation, identification, clustering, and classification of disease

Article Open access 08 March 2021

Developing symptom clusters: linking inflammatory biomarkers to depressive symptom profiles

Article Open access 31 March 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The global burden of disease—current situation

Prevalence and incidence of non-communicable diseases (NCD) are continuously increasing in numbers, causing a strong socio-economic as well as a medical burden to the healthcare systems. Economically speaking, the US healthcare costs have steadily increased for 4 consecutive years, to reach 3.8 trillion US dollars in 2019 [1, 2]. NCD caused 90% of these costs as they result in massive long-term treatment costs and are often present with comorbidities [1, 2]. Thus, the prevention of NCD, and in this context the identification of at-risk individuals and sensitive biomarkers of disease risk, is more important than ever as it represents a leverage point to reduce the economic as well as the individual burden of diseases.

The contribution of the current study

The two-consecutive study presented here demonstrates how routinely assessed biomarkers can be bioinformatically clustered and utilized to identify individuals with a high disease burden. Specifically, in study 1, we employed a clustering approach based on C-reactive protein (CRP), interleukin-6 (IL-6), fibrinogen, cortisol, and creatinine concentrations in a US cohort and validated the identified clusters in a Japanese cohort (for a study overview, see Fig. 1). We then linked these biochemical clusters to documented diseases including depression, heart disease, hypertension, stroke, peptic ulcer disease (PUD), and cancer. In study 2, we tested the association of childhood maltreatment (CM), a well-established early-life risk factor for developing mental and somatic disorders, with diseases as well as with the identified biochemical clusters from study 1.

General methods

Description of the study populations

US American sample

Data were drawn from the biomarker subsample of the Midlife in the United States (MIDUS) study between 1995 and 1996 [3]. For more information about the project, please see http://www.midus.wisc.edu/data/index.php. A total of 1255 individuals participated in the biomarker study, and of those complete biomarker data was available from 1234 individuals.

Japanese Sample

Data were drawn from the Midlife in Japan (MIDJA) study (N = 1027). In 2009–2010 biomarker data were generated for a subset of these participants (N = 378). Data were obtained analogically to MIDUS.

Study 1

Introduction

The importance of risk evaluation in personalized medicine, targeted prevention, and predictive diagnostics

According to the Global Burden of Disease study (2017), between 1990 and 2017, disability-adjusted life years (DALYs) due to NCD increased from 1.2 to 1.6 billion. With that, NCD caused more than 60% of DALYs worldwide [4]. But NCD cause not only individual suffering but also burden society as a whole, due to massive monetary and non-monetary costs [4, 5]. Relying on interventions—no matter how effective they are—after individuals are already ill is therefore a pivotal fallacy. Instead, current developments require simple and inexpensive ways to identify high-risk individuals to target with both preventive and interventive approaches. Furthermore, it is increasingly becoming clear that many well-established risk factors (such as body mass index (BMI) outside the normal range [6], genetic risk factors [7, 8]) supposedly helping to identify individuals at high risk for certain diseases are not independently from the individual environment and do not behave the same way across different individuals, highlighting the importance of personalized, tailored approaches in the context of preventive medicine. The presence of one particular risk factor might not have much predictive character for negative outcomes without being considered systemically/holistically, that is, in the context of other physiological, environmental, psychological, and biochemical parameters and processes [e.g., 6–8]. Despite these intricacies, at the same time, disease-predictive measures should be cost-efficient making it possible to implement them in the healthcare system.

The allostatic load index: chances and limitations

One particular concept that has become well-established in the literature is the concept of allostatic load (referring to the cumulative burden of chronic stress and adverse life events) with its suggested allostatic load index (ALI) [9]. ALI is a cumulative multi-system risk score based on physiological and biochemical measures [10]. For each system, risk indices are calculated as the proportion of biomarkers for which an individual falls into predefined high-risk quartiles.

As a systemic risk score, ALI is predictive for various outcomes, including all-cause mortality [11, 12], while there are some critical limitations concerning its conceptualization. First, calculating a risk score as the sum of different system risk scores does not allow to account for intersystemic interactions and the possible predictive effect of these interactions. This gap is unfortunate as ALI includes parameters that indeed are not independent of each other, such as BMI and blood pressure [13]. Another concern refers to practicability and implementation of ALI into the healthcare system. While ALI considers parameters that can be assessed relatively simple, it is still likely that, for most individuals, parameters are only partially available, possibly limiting the predictive power of ALI. Together, ALI is a profound concept but artificially splits physiological processes that are woven into a holistic allostatic reaction, as acknowledged by the developers of ALI [14]. Furthermore, ALI lacks practicability, which is underlined by the fact that, to date, ALI has not been implemented in routine diagnostics.

A novel biochemical clustering approach

Given the rising number of NCD, there is an urgent necessity to develop an approach that is practicable, cost-efficient, and at best, based on biomarkers that are assessed in clinical routine allowing to identify high-risk individuals to target with specific preventive steps. The current study aimed to develop and validate an easily accessible measure that can realistically be implemented in routine diagnostics. Towards this aim and building on ALI, five biomarkers were chosen as they cover broad physiological functionality; CRP, fibrinogen, and IL-6 are pro-inflammatory markers (i.e., positively associated with inflammation), cortisol as the end product of the hypothalamus–pituitary–adrenal axis is an immune-modulatory mediator playing a crucial role in stress response, and creatinine is important for cellular energy metabolism [15,16,17,18,19]. Contrary to ALI, employing a clustering approach based on these biomarkers allows to account for linear and non-linear interactions among them and to link the resulting clusters to a range of mental and somatic diseases. To examine the association between biochemical clusters and diseases, we focused on depression, heart disease, hypertension, stroke, PUD, and cancer as these represent globally the highest prevalence, the fastest increase in numbers, and the utmost comorbidities [4]. We first clustered biochemical markers and related them to odds ratios (ORs) for diseases in a US population sample and then repeated this process in a Japanese cohort. To ensure representativity, both samples were recruited via random-digit-dialing qualifying them for studies with results generalizable to the population. Towards our aim to ensure that the selected biomarkers and their clustering demonstrate robust applicability across different cultures and ethnicities [20], we chose one US American and one Japanese sample to generate and validate the biochemical clusters.

Methods

Collection of biosamples and the assessment of biochemical markers

MIDUS

Blood samples were collected after overnight fasting for the assessment of CRP, IL-6, and fibrinogen, according to the manufacturer guidelines (Dade Behring Inc., Deerfield, IL for CRP and fibrinogen; R&D Systems, Minneapolis, Minnesota for IL-6) [20]. Plasma levels of CRP and fibrinogen were assayed using immunonephelometric assay; IL-6 was quantitatively assessed using enzyme-linked immunosorbent assay (ELISA). The laboratory inter-assay coefficient of variance was 5.7% for CRP, 13% for IL-6, and 2.6% for fibrinogen, all below the 20% acceptable range [21].

To obtain a cumulative cortisol and creatinine measure, 12-h overnight urine samples were collected between 7 PM and 7 AM. Enzymatic colorimetric assays and liquid chromatography-tandem mass spectrometry were performed at the Mayo Medical Laboratory in Rochester, Minnesota. Data were excluded if participants had a renal failure or severe renal decline according to glomerular filtration rate [21].

MIDJA

CRP, IL-6, and fibrinogen were assessed analogically to MIDUS, while cortisol was assessed in saliva (3 subsequent days, three times each day) and creatinine was assessed in blood. The 9 saliva measurements were averaged and used as a representative marker for cortisol concentrations [22]. We used blood levels of creatinine.

Diseases

Depression, heart disease, hypertension, stroke/transient ischemic attack (TIA), PUD, and cancer were assessed via self-report. Participants were asked if they were ever diagnosed with any of these diseases before/at the time of study participation.

Statistical analyses

First, the potential collinearity of the biomarker levels was assessed by calculating Pearson correlations among CRP, fibrinogen, IL-6, creatinine, and cortisol. After randomizing the order of participants [23], we performed a k-mean cluster analysis with these markers in the MIDUS sample using IBM SPSS Statistics 27. To ensure the stability of clusters, we repeated the clustering process in subsamples [23]: Specifically, we conducted a median split based on age and performed the clustering for each group separately to assess whether the clusters are age-dependent. For the same purpose, we repeated the clustering procedure after excluding participants with a BMI outside the health range (below 18 or above 35). The next step was to repeat biochemical clustering that was performed for the whole MIDUS sample, in the MIDJA cohort. Finally, z-tests were used to compare ORs for diseases among clusters.

Results

Preliminary analyses

In both MIDUS and MIDJA samples, biomarkers were positively correlated (see SI Tables 4 and 5).

In MIDUS, 24.1% of the participants (currently or previously) had depression, 11.5% heart disease, 37.1% hypertension, 4.3% stroke/TIA, 5.3% PUD, and 13.6% cancer. In MIDJA, 4.5% of the participants had depression, 5.6% heart disease, 19.3% hypertension, 1.1% stroke/TIA, 8.3% PUD, and 5.1% cancer.

K-mean clustering

We used z-standardized biomarkers for k-mean clustering and evaluated the clustering results from k = 2 to 6 clusters for MIDUS. When k = 2, the patterns of clusters were not distinct enough; when k = 4 or above, some clusters were very small in size. Through a combination of the parsimonious principle and engineering meaningful difference among clusters, k = 3 were selected for the subsequent analyses. Figure 2 illustrates the distributions of the three identified clusters with respect to the biochemical markers. We replicated all three clusters in the younger MIDUS cohort as well as clusters 1 and 2 in the older MIDUS cohort (SI Figs. 7 and 8). We further replicated all three clusters in the BMI-restricted MIDUS cohort (SI Fig. 9).

Then, the 3-cluster solution from MIDUS was validated in the MIDJA sample; the results are shown in Fig. 3.

As depicted in Figs. 2 and 3, cluster 1 is characterized by average levels in all biochemical measures. Cluster 2 is characterized by high and above-average levels for CRP, IL-6, and fibrinogen. Cluster 3 is characterized by high and above-average levels for cortisol and creatinine but average concentrations of CRP, fibrinogen, and IL-6.

Associations between biochemical clusters and disease states

MIDUS

Cluster 2 had the highest ORs for all considered diseases compared to the clusters 1 and 3 (Fig. 4, SI 10).

MIDJA

Cluster 3 had the highest ORs for heart disease, hypertension, and PUD, cluster 2 had the highest ORs for stroke and cancer, and cluster 1 had the highest ORs for depression (Fig. 5, SI 10.1).

To compare this cluster-based approach to a well-established clinical biomarker that is associated with a broad range of NCD, the number of diagnoses among individuals in cluster 2 was compared to the number of diagnoses among individuals with CRP concentrations above the clinical cutoff (> 3 mg/L) [24]. The disease burden in cluster 2 was higher with 1.6 diagnoses (SD = 1.16; 0.9 diagnoses for individuals not assigned to cluster 2) compared to individuals above the CRP cutoff with 1.2 diagnoses (SD = 1.07; 0.9 diagnoses for individuals below the cutoff).

Discussion

Three biochemical clusters in the general population

Findings reveal three distinct and interculturally stable biochemical clusters observable in the general population. Cluster 1 is characterized by average levels of all biomarkers, cluster 2 by high inflammation-related mediators coupled with low cortisol and creatinine, and cluster 3 by high levels of cortisol and creatinine. The stability of clusters is supported by their replication in the MIDJA sample as well as in the BMI-restricted, in the younger (below age median) and in the older MIDUS cohort (above age median; here only clusters 1 and 2 were replicated). However, we did not replicate cluster 3 in the older MIDUS cohort. One explanation could be that, due to an age-related increase in systemic inflammation [25], older individuals were not assigned to cluster 3, which is characterized by low inflammation.

The link of biochemical clusters to disease states

Relating clusters to diseases, in MIDUS, cluster 2 showed the highest ORs for depression, heart disease, hypertension, stroke, and cancer (Fig. 4). These findings are supported by previous evidence suggesting that CRP, IL-6, and fibrinogen are associated with depression [26, 27], coronary heart disease [28,29,30,31], blood pressure [32], stroke [33,34,35], and cancer [36, 37]. However, contrary to these previous studies, the clustering approach used in this study allowed to account for well-known collinearities between biomarkers and thus promotes a more holistic perspective. Specifically, findings build on previous studies suggesting a link between inflammation and diseases [25] by demonstrating that it might not be one specific biomarker but a specific biochemical pattern (i.e., high CRP, IL-6, fibrinogen coupled with low cortisol and creatinine) that is associated with diseases. This idea is supported by the observation that individuals in cluster 2, descriptively, indicate a higher disease burden than individuals above the clinically well-established CRP cutoff.

Interestingly, we found no differences in the ORs for PUD between clusters despite the role of inflammation in its pathology [38]. Future research may aim to further examine the role of inflammatory signaling in the pathology of PUD.

While the cluster with high levels of CRP, IL-6, and fibrinogen can be considered a high-risk cluster, cluster 3 with high levels of cortisol and creatinine but low inflammation may be considered a protective cluster in MIDUS. We found that ORs for most diseases were lower in cluster 3 not only as compared to the high-risk cluster but also as compared to cluster 1 with average levels of all biomarkers. Concerning cancer, this difference became significant, potentially suggesting a protective character of this cluster. This would be in contrast to studies suggesting a link between hypercortisolism and disease outcomes [39, 40]. However, the combination of low inflammation and high cortisol and creatinine as in cluster 3 might indicate the integrity of the glucocorticoid negative feedback system, protecting from negative health outcomes [41]. Longitudinal studies may examine the consequences of this specific biochemical pattern. Towards this aim, we will examine MIDUS follow-up data (10 years after biomarker assessments) with respect to mortality outcomes.

In MIDJA, cluster 2 only seems to be a high-risk cluster for stroke and cancer while for other considered diseases, cluster 1 or cluster 3 indicates the highest burden. One aspect to consider here is that the MIDJA sample (N = 378) and especially cluster 2 were very small in size (N = 30). It is, therefore, possible that the present findings lack reliability. However, different biochemical patterns may be associated with different outcomes in the Japanese compared to the US American population because moderating mechanisms such as BMI, nutrition, and medication differ between populations [41]. This idea is supported by the finding that although in both MIDUS and MIDJA, approximately 8% of participants were assigned to cluster 2, the disease burden in MIDJA was much lower compared to MIDUS. This highlights the importance of individual aspects in disease susceptibility mentioned above and the role of interactions among different cultural, lifestyle, and biochemical factors; while an assignment of a US American individual to cluster 2 might be associated with a high disease burden, this might not be the case for a Japanese individual with the similar biochemical profile. Future studies should aim to examine the found biochemical clusters in other cultural contexts promoting a better understanding of their associative and predictive character in multiple populations. From a preventive perspective, this may also help to further precise targeted prevention, that is, to better understand which biochemical profile is associated with what disease susceptibility under what conditions.

Limitations

Our work has several strengths such as the validation of the clusters in an independent, Japanese sample and the representative character of cohorts. Yet, the findings face limitations. First, the present study is cross-sectional not allowing causal inferences. Second, the MIDJA sample size was relatively small. It is, therefore, possible that the ORs lack reliability. Third, methodological inconsistencies (urine cortisol and creatinine levels in MIDUS, average saliva levels of cortisol and blood levels of creatinine in MIDJA) between the cohorts may have impacted the clustering process. Fourth, diseases were assessed via self-report, which bears the risk of a report bias.

Conclusion

While the interactions among biomarkers make the distinction of their outcomes challenging, the design of the current study helps to gain a better understanding regarding the biochemical patterns that are present in the general population and how these patterns contribute to different physiological states on a systemic scale. We identified and replicated three distinct biochemical signatures in two mid-life populations including one cluster with collinearly occurring elevated levels of CRP, fibrinogen, and IL-6 as well as low concentrations of cortisol and creatinine that indicated the highest prevalence of stroke and cancer.

Future longitudinal studies should aim to test the predictive character of the clusters found in this study, because, if clusters are indeed predictive in terms of risk evaluation, then they would represent a valuable clinical tool for both diagnostics and prevention of diseases. Specifically, if high-risk individuals can be identified by the clustering approach presented here, then these individuals could be provided with personalized treatment options including psychotherapy, anti-inflammatory drugs, and treatment supplements, e.g., nutrition and exercise plans.