Background

Cardiovascular disease (CVD) is one of the leading causes of mortality and morbidity, with substantial impact on the public health worldwide. Beyond the traditional risk factors for CVD, novel circulating blood biomarkers are frequently studied [1, 2], which can be measured with relative ease and are capable of detecting subtle changes in the pathophysiological processes underlying CVDs.

Red cell distribution width (RDW) reflects the heterogeneity of the red blood cells (RBC) volumes, which is often used clinically as a diagnostic tool of patients with anaemia. Since Felker et al. [3] first identified that RDW may be useful for predicting both morbidity and mortality in heart failure (HF) patients, increasing number of studies showed that high RDW is associated with incidence and prevalence of a broad range of CVDs, including atrial fibrillation [4], stroke [5, 6], and myocardial infarction [2, 7]. Though the specific mechanisms between RDW and adverse cardiovascular outcomes have not been sufficiently investigated, it has often been proposed that low-grade inflammation or the actions of pro-inflammatory cytokines could be a common cause of high RDW and CVD [7]. It has been shown that inflammatory cytokines could inhibit the maturation of RBCs [8].

In this population-based study, our primary aim was to explore the potential associations between RDW and a panel of circulating proteins known or suggested to be associated with CVD pathology. The initial analysis was performed in a discovery sub-cohort, consisting of a random 2/3 of the cohort, with adjustment of potential confounding factors. Plasma proteins with significant Bonferroni-corrected relationships with RDW were then confirmed in a validation sub-cohort consisting of the remaining 1/3 of the cohort.

Methods

Study population

The Malmö Diet and Cancer (MDC) cohort was established between 1991 and 1996. A total of 30,446 men and women from the city of Malmö were recruited by mail and newspaper advertisement and invited to a health examination at a screening centre. A random 50% of those examined between October 1991 and February 1994 (MDC-Cardiovascular Cohort, MDC-CC, n = 6103), were selected to undergo additional examinations for a cardiovascular sub-study. Of them, 4742 people had information for plasma proteins. In addition, individuals with missing data for RDW (n = 3), smoking (n = 6), high-density lipoprotein (HDL) (n = 1), haemoglobin (HGB) (n = 1), and low-density lipoprotein (LDL) (n = 5) were excluded. After exclusion of missing values, 4726 subjects (1886 men and 2840 women) with average age 57.51 ± 5.96 (mean ± standard deviation, SD) were analysed in our study.

All data were pseudonymized during analytic work with no identity being revealed.

Baseline examination

Anthropometric measurements were made at baseline using standard procedures. Venous blood samples were drawn at the first visit at the screening centre. HGB, and erythrocyte diameter were analyzed in fresh, heparinized blood, using a fully automated assay (SYSMEX K1000 hematology analyzer; TOA Medical Electronics, Kobe, Japan). RDW was calculated as the width (fL) of the erythrocyte distribution curve at the relative height of 20% above the baseline [9]. Reference values were 36.4–46.3 fL in women and 35.1–43.9 fL in men [10]. The relationships between RDW, cardiovascular risk factors and incidence of cardiovascular disease has been presented in previous papers [4, 5, 7, 11]. Weight and height were measured in light indoor clothing, without shoes. Body mass index (BMI) was calculated as weight/height2 (kg/m2). Smoking was obtained from the self-administered questionnaire. Smoking was categorized in two categories—smoking (i.e., current or occasional smokers) and non-smoking (never smokers or former smokers). Diabetes was defined as self-reported physician-diagnosed diabetes or current use of diabetes medication or with venous whole blood glucose ≥ 6.1 mmol/L (corresponding to plasma glucose ≥ 7.0 mmol/L).

Laboratory measurements

HDL and glucose levels were analysed using standard procedures at the Department of Clinical Chemistry, Malmö University Hospital. LDL levels were calculated according to the Friedewald formula. Plasma proteins were measured in fasting EDTA-plasma which had been frozen at − 80 °C after collection at the baseline examination until analysis.

Proteomics analysis

Ninety-two plasma proteins were analysed using the Olink Proseek Multiplex CVD Panel I 96 × 96 Kit (Olink Bioscience, Uppsala, Sweden), based on the Proximity Extension Assay (PEA) technology with the Fluidigm BioMark HD real-time PCR platform in 54 chip runs. PEA uses matched antibodies labelled with unique oligonucleotides, binding to a targeted protein. This makes probe pairs hybridize and create double-stranded signals, which are amplified and quantified by the PCR platform [12], generating Normalized Protein Expression (NPX) values which corresponds to protein levels. Normalization procedure included a set of internal controls (incubation, extension and detection controls) and a total of 6 external controls for each plate, used to correct for variation between runs and plates (inter-plate controls) and for assessment of detection limits. The protein concentrations are presented as arbitrary units (AU) on a log2 scale. LOD (limit of detection) is defined as 3 × standard deviations (SD) above background based on the negative controls in each run. Protein values below the LOD were replaced with LOD/2. Intra- and inter-assay coefficients of variation for the various proteins, and information regarding the CVD proteomic panel, PEA technology, data normalization and standardization is available in detail on the Olink webpage (http://www.olink.com). Previous studies from this cohort have reported the plasma protein profiles in individuals with high cadmium concentrations and poor self-rated health [13, 14].

Four proteins with less than 75% of subjects having a valid measurement were excluded: Beta-nerve growth factor (Beta-NGF, n = 478); Protein S100-A12 (EN-RAGE, n = 128); Natriuretic peptides B (BNP, n = 696); Interleukin-4 (IL-4, n = 29). In total, 88 cytokines were used for analyses in this study. For analytic purposes, the cohort was divided randomly in an approximate 2:1 ratio (discovery cohort: 2/3 population; validation cohort: 1/3 population).

Statistical analysis

Data were presented as mean ± SD for continuous variables with normal distribution and percentage for categorical variables. Multiple linear regression was performed to explore the associations between RDW and different proteins (one protein at a time), with RDW as dependent variable and protein as well as other risk factors as independent variables. We used the standardized form of plasma proteins (i.e. the Z-score) to allow for direct comparisons of different biomarkers. Beta coefficients with 95% confidence intervals (CIs) were presented. Model 1 was adjusted for age and sex, model 2 was adjusted for potential confounding factors (i.e. age, sex, BMI, HGB, LDL, HDL, diabetes, smoking). The calculations were performed using IBM SPSS Statistics V.27 (www.spss.com). P value < 5.68 × 10–4 (Bonferroni adjustment for 88 tests, 0.05/88) was considered significant in the discovery population (2/3 of subjects). A p value < 0.05 was used as criterion for successful replication in the remaining 1/3 of the population. Since RDW and several plasma proteins are increased by smoking, we also examined the associations with RDW in never smokers as sensitive analysis. This sensitivity analysis was only performed for proteins with significant Bonferroni-adjusted p-values. Pearsons’ correlation test was also performed to examine the relations between each two replicated plasma proteins.

Results

Characteristics of subjects

The characteristics of the study population were shown in Table 1. Mean age was 57.5 years, 60% were women, and prevalence of smoking was 22.4% in men and 21.0% in women. The distribution of RDW is illustrated in Additional file 8: Figure S3. The results from the proteomics analysis and a list of the proteins are presented in Additional file 1: Table S1.

Table 1 The characteristics of the study population

RDW in relation to plasma proteins

Discovery sample

Thirty-one of 88 plasma proteins were significantly associated with RDW in the discovery cohort (n = 3151, random 2/3) (p < 5.68 × 10–4), with adjustments of age and sex (Additional file 6: Figure S1). Thirteen of 88 plasma proteins showed significant associations with RDW in the discovery population (n = 3151, random 2/3) (p < 5.68 × 10–4), after adjustments for age, sex, BMI, HGB, LDL, HDL, diabetes and smoking. The significant proteins were stem cell factor (SCF, inverse association), growth differentiation factor 15 (GDF-15), SIR2-like protein 2 (SIRT2), melusin (ITGB1BP2), matrix metalloproteinase-7 (MMP-7), hepatocyte growth factor (HGF), chitinase-3-like protein 1 (CHI3L1), interleukin-8 (IL-8), CD40 ligand (CD40-L), urokinase plasminogen activator surface receptor (U-PAR), matrix metalloproteinase-3 (MMP-3), prolactin (PRL) and myoglobin (MB). (Fig. 1, Additional file 2: Table S2).

Fig. 1
figure 1

Red cell distribution width in relation to plasma proteins in discovery cohort. The beta coefficient and 95% confidence interval (CI) were obtained from multiple linear regression performed separately for each protein. Adjustments: age, sex, BMI, HGB, LDL, HDL, diabetes, smoking. P < 5.68 × 10–4 is significant

Replication sample

Thirteen proteins were assessed for significance in the remaining 1/3 replication sample (n = 1575). Eleven of them were significantly associated with RDW (p < 0.05) (Fig. 2, Additional file 3: Table S3), which included SIRT2, SCF, ITGB1BP2, GDF-15, CHI3L1, CD40-L, MMP-7, IL-8, HGF, U-PAR and MMP-3, respectively. Among them, GDF-15 showed the most significant associations with RDW (beta = 0.46, 95%CI: 0.29–0.63, p = 7.97 × 10–8). SCF was inversely associated with RDW (beta = − 0.36, 95% CI: − 0.52— − 0.21), p = 5.59 × 10–6). Scatterplots of RDW and four of the significant proteins (GDF-15, SIRT2, CHI3L1, SCF) are presented in Additional file 9: Figure S4.

Fig. 2
figure 2

The associations between red cell distribution width and plasma proteins in replication cohort. The beta coefficient and 95% confidence interval (CI) were obtained from multiple linear regression performed separately for each protein. Adjustments: age, sex, BMI, HGB, LDL, HDL, diabetes, smoking. P < 0.05 is significant

Protein–protein correlations

We analysed the correlations between the plasma proteins with significant relationships with RDW (Fig. 3, Additional file 4: Table S4). We found moderate correlations between most of the proteins and all correlations were statistically significant (p < 0.01).

Fig. 3
figure 3

Pair-wise correlations between plasma proteins with significant relationships with RDW. Correlations were assessed between each two proteins using Pearson’s correlation test. Stronger correlation corresponding to darker colour

Sensitive analysis in never smokers

Eleven of 13 plasma proteins were significantly associated with RDW in never smokers, after multivariate adjustments. Significant proteins in never smokers were ITGB1BP2, SIRT2, PRL, CHI3L1, GDF-15, MMP-7, CD40-L, MMP-3, IL-8, HGF and SCF. The most significant negative associations among them was SCF (beta = − 0.34, 95% CI: − 0.48— − 0.20, p = 2.38 × 10–6) (Additional file 5: Table S5, Additional file 7: Figure S2).

Discussion

Previous studies have linked RDW to a wide range of cardiovascular outcomes and adverse prognosis in patients with CVD [15] and with incidence of CVD in studies from the general population [2, 4,5,6,7]. However, the underlying mechanisms by which elevated RDW levels were associated with adverse outcomes remain unclear. The present results show that several proteins with possible associations to CVD are associated with RDW. Eleven proteins (GDF-15, SIRT2, ITGB1BP2, MMP-7, SCF (inversely), CHI3L1, MMP-3, HGF, IL-8, U-PAR, and CD40-L) were significantly associated with RDW after adjustment for possible confounding factors, both in the discovery and replication sample.

High RDW reflects a high heterogeneity of the volumes of RBCs. There could be some principally different reasons for increased RDW values. A high production of large immature erythrocytes, e.g., after major bleeding, is associated with high RDW. Similarly, if the RBCs are heterogeneous already when released into the circulation, this could also cause high RDW. After the reticulocytes have been released, they rapidly develop into smaller erythrocytes, and the volumes of the RBCs then gradually shrink and become smaller over their life span. The average survival time for erythrocytes in the circulation is approximately 120 days, however, it has been shown that there is substantial variations between individuals [16]. Therefore, a high proportion of old and small RBCs could also result in high RDW. Patel et al. [17] proposed that high RDW is a result of delayed elimination of RBCs from the circulation, beyond the average survival time of 120 days. The delayed elimination of erythrocytes could be a physiological response to stress or poor health, with the purpose to save energy and iron to the body [18], and thereby a non-specific marker of disease. This view is supported by the fact that RDW has been positively correlated to haemoglobin A1c (HbA1c), which is influenced by the life-span of the RBCs, but not with plasma glucose [11, 17]. It is not possible to examine the possible reasons for high RDW in this study. However, it is likely that both individuals with a delayed elimination of old erythrocytes and individuals with a high production of large RBCs could be present in this large study from the general population.

It is widely acknowledged that low grade inflammation and chronic disease is associated with anaemia [19] and that several pro-inflammatory cytokines, such as IL-6 and tumor necrosis factor-α (TNF-α) could inhibit erythropoiesis and the actions of erythropoietin [20]. Inflammation includes the production of a wide range of pro-inflammatory cytokines as well as proteins with compensating and regulating effects [21]. It is noteworthy that several of the proteins associated with RDW in this study can be regarded as markers of inflammation (e.g. GDF-15, CD40-L, IL-8, MMP-3, MMP-7, U-PAR and CHI3L1), and it is possible that they were related to erythropoiesis, either directly, or through other proteins in the inflammatory cascade. It is also noteworthy that many of these plasma proteins have been associated with increased risk of CVD [22,23,24,25,26,27,28].

GDF-15 is a protein with many functions, which has been reported to be elevated in various anaemic conditions [29]. GDF-15 has a role in regulation of iron homeostasis [30] and it has been shown that GDF-15 expression is regulated by hypoxia [31] and iron depletion [32]. This could potentially explain its relationship with RDW in our study. It is also noteworthy that GDF-15 has been repeatedly associated with all-cause mortality [33] and CVD [22].

HGF has multiple pleiotropic effects on cardiovascular system [34, 35] and is involved in activation of hematopoietic progenitor cells. It has been shown that HGF is produced by human bone marrow stroma cells and promotes proliferation, adhesion and survival of human haematopoietic progenitor cells [36,37,38]. It was recently shown that HGF is associated with progression of atherosclerosis, which perhaps could reflect a compensatory and repairing function of HGF in atherosclerosis [35].

By contrast, there was negative association between RDW and SCF. SCF is a major stimulator of erythropoiesis and works synergistically together with erythropoietin [39]. It could be speculated that low levels of SCF might be an indicator of reduced renewal of erythrocytes in the circulation and an increased population of older and smaller RBCs, which in turn, could explain the negative correlation between RDW and SCF [40, 41]. It is noteworthy that low SCF was associated with increased risk of CVD in a recent study from the MDC-cohort [42].

Strengths and limitations

We used a well-defined population-based cohort with extensive information about plasma proteins as well as potential confounding factors. The sample size was big enough to allow analysis of a random discovery and replication sample. The PEA technology is known to be very specific, however, the concentrations are given as arbitrary units and not in International System of Units (SI units), which is a limitation. The present study included 88 plasma proteins which could be linked to CVD. However, we have no information about protein biomarkers for other diseases. Whether RDW is associated with biomarkers for other diseases should be examined in future studies. Another limitation is that the analysis was performed in a cross-sectional design and we can only speculate about causality.

Conclusions

In our prospective cohort study, eleven of 88 plasma proteins showed significant associations with RDW which were replicated successfully in validation sub-cohort. These proteins could be related to the increased cardiovascular risk in individuals with high RDW.