Background

Preeclampsia (PE) is a potentially life-threatening, multi-organ disease characterized by elevated blood pressure (> 140 mmHg systolic or > 90 mmHg diastolic) and proteinuria (≥0.3 g/day) after 20 weeks of gestation. PE affects 3–5% of all pregnancies and is a major cause of maternal mortality and morbidity. PE also leads to an increased risk of neonatal death and morbidity [1, 2]. Risk factors for PE include age, conditions such as chronic hypertension and diabetes, obesity, and a prior history or family history of PE [3]. Studies have shown that treatment with low-dose aspirin, started before 16 weeks of gestation, reduces the risk and incidence of PE [4,5,6]. Currently available biomarkers, such as serum angiogenic and antiangiogenic vasoactive agents including placental growth factor (PIGF) and soluble fms-like tyrosine kinase 1 (sFlt-1) are not sensitive enough for the early prediction of PE [7]. Clinical risk factors, including chronic hypertension, diabetes, obesity, primiparity and a previous history of PE can be used in combination with known biomarkers to identify women at increased risk of developing PE [8]. However, there is still a need for new predictive biomarkers for PE.

The pathological changes of PE are initiated during early gestation, while symptoms manifest later [9]. Tools to predict PE before the clinical manifestation of the disease are needed. To be able to influence the pathological process, at-risk women should be identified in early pregnancy. As PE is usually characterized by proteinuria and urine contains a variety of proteins including plasma proteins that have passed through the glomerular filtration barrier, urine is ideal for proteomic analyses and can be collected noninvasively.

Previous urine proteomics studies have identified several proteins that could be of value for the diagnosis of PE and for distinguishing PE from other hypertensive disorders [10]. A study using SELDI-TOF mass spectrometry to analyze urine samples identified a proteomic signature characterized by fragments of SERPINA1 and albumin that could predict PE and distinguish it from other disorders [11]. A study using micro-time-of-flight mass spectrometry identified a model of 50 biomarkers that was associated with subsequent PE, although this model did not reliably predict PE at time points earlier than 28 weeks of gestation [12]. Another study identified 35 peptides that could distinguish severe or mild PE from controls [13]. These authors also identified a panel consisting of 22 peptides that could differentiate PE from chronic and gestational hypertension, as well as normotensive controls [14]. However, there is still a lack of studies that have identified individual proteins that could be able to reliably predict PE in early gestation.

In this study, we have analyzed the urine proteome of prospectively collected serial urine samples from seventeen pregnant women. We included women with clinical risk factors for PE, some of whom later developed PE, and women without known risk factors. The aim of this study was to identify candidates for new non-invasive biomarkers to predict PE in early gestation. Here, we present multiple urine proteins that are strong candidates for new predictive markers for PE.

Methods

Patient cohort and study design

The present nested case-control study is a part of a multidisciplinary “Prediction and prevention of pre-eclampsia” (PREDO) study [15, 16] approved by the Ethics committee of Obstetrics and Gynecology, Hospital District of Helsinki and Uusimaa (Dnro 3/E8/05). The study was carried out between September 2005 and December 2009 in ten participating hospital maternity clinics in Finland. The project has been previously described in detail [16]. Briefly, 1082 women were recruited, 972 with clinical risk factors for PE and 110 women without known risk factors as a control group. The participants were enrolled in this study at 12–14 weeks of gestation on the basis of existing clinical risk factors for PE (with the exception of those women without clinical risk factors who were enrolled as a control group) and their pregnancies were followed normally. Randomly selected women gave fasting blood samples and overnight urine samples at three time points (12–14, 18–20 and 26–28 weeks of gestation) during pregnancy. Written informed consent was obtained from all participants before joining the study. Pregnancy data were collected from the medical records of maternity clinics and hospitals. Pregnancy outcomes were ascertained by a jury of two medical doctors and a midwife who met face-to-face and agreed upon the diagnosis of each participant. PE was diagnosed as new-onset hypertension after 20 weeks of gestation with proteinuria > 0.3 g per day. This study was carried out in accordance with the Declaration of Helsinki.

In this study, we included seven women with clinical risk factors for PE who eventually developed PE before 34 + 0 weeks of gestation, defined as early-onset PE (PE group). However, none of these seven women had clinical disease at any of the time points when samples were obtained. Six women with clinical risk factors who did not develop PE (risk controls, RC) and four women without risk factors for PE (healthy controls, HC) were randomly selected as controls. The clinical risk factors present in the women are given in Table 1. In this sub-study, we chose to include women from the larger PREDO cohort from whom urine samples had been obtained at all three time points during pregnancy. A total of 51 urine samples were obtained from these seventeen women. Enrolment in the study did not impede PE diagnosis in these women.

Table 1 The clinical risk factors for PE the women in this study had when enrolled in the prospective PREDO cohort

The participants in this study were Caucasian of Finnish origin. Family history was asked about in a questionnaire but not obtained for all participants. For those women from whom information was obtained (PE group: 2 women, RC group: 4 women, HC group: 4 women), none were born from a pre-eclamptic pregnancy. The age of the women ranged from 23 to 42 years, with the median being 34 years. The median weight was 75 kg and the median pre-pregnancy body mass index (BMIBP) was 25.9 kg/m2, increasing to 32.0 kg/m2 at the end of the pregnancy. Detailed clinical parameters and measurements are given in Additional file 2.

Sample collection

Overnight urine samples were collected without preservative from each subject during three time points of pregnancy; at 12+ 0 to 14+ 0, 18+ 0 to 20+ 0, and 26+ 0 to 28+ 0 weeks and days of gestation. The overnight morning urine, which was collected in commercial plastic containers and kept refrigerated, was carefully mixed and at least two portions of 10 ml of urine from each collection were stored in deep-freeze tubes at − 80 °C until analysis.

Sample preparation

Urine samples were thawed and 5 mL of urine was reduced (10 mM DTT) and alkylated (40 mM iododacetamide) for 1 h each inside the Amicon Filter (3 kDa MWCO, Millipore). Between consecutive steps, the previous solution was removed by centrifugation. To deplete albumin, 400 μL of CaptureSelect anti-Human serum albumin resin was added to the filter and incubated for 10 min with rotation to mix the sample. Resin was washed with 5 mL MQ water two times and resuspended in 100 μL MQ water before being transferred to centrifuge tubes. After centrifuging at low speed, the supernatant was collected. Protein concentration was determined using the BCA assay and equal amounts of protein were digested with sequencing grade trypsin (1:50, Promega) overnight at 37 °C. The next day, the total peptides were cleaned with C18 spin columns (ThermoFisher) and dried with a vacuum dryer (Savant, ThermoFisher). The cleaned peptides were dissolved in 0.1% formic acid at a concentration of 1.4 μg peptide in 4 μL (1 injection) and analyzed by ultra performance liquid chromatography-ultra definition mass spectrometry (UPLC-UDMSE).

UPLC-UDMSE

TRIZAIC nanoTile 85 μm × 100 mm HSS-T3u wTRAP was used as a separation device. Buffers were made in the following way: Buffer A: 0.1% formic acid in water and Buffer B: 0.1% formic acid in acetonitrile. Samples were loaded, trapped and washed (8 μL/min of 1% buffer B) before the start of the analytical gradient. The analytical gradient used was as follows: 0–1 min 1% B, at 2 min 5% B, at 65 min 30% B, at 78 min 50% B, at 80 min 85% B, at 83 min 85% B, at 84 min 1% B, and at 90 min 1% B with 450 nL/min. Buffer A was 0.1% formic acid in water while buffer B was 0.1% formic acid in acetonitrile.

Data-independent acquisition was used with UDMSE mode in Synapt G2-S HDMS (Waters Corporation, MA, USA). Calibration was performed with Glu1-Fibrinopeptide B MS2 fragments and the precursor ion was used as a lock mass. Data was collected in the range of 50–2000 m/z with 1 s scan time. Ion-mobility spectroscopy (IMS) wave velocity was 650 m/s. All the samples were run as triplicates before being analyzed by Progenesis QI for Proteomics (Nonlinear Dynamics, Newcastle, UK). The coefficient of variation (%CV) for the dataset was 9.83%.

Data analysis

Data analysis was performed as previously described [17]. Briefly, after importing the raw files to Progenesis QI for Proteomics, lock mass correction was applied using doubly charged Glu1-Fibrinopeptide B (785.8426 m/z). Automatic run alignment and peak picking was performed with the default parameters of the software. Label-free quantitation was performed according to Silva et al. [18]. Protein Lynx Global Server facilitated peptide identification. Every injection of peptides was spiked with 50fmole of Hi3 peptides mix (Waters), which enabled normalization and quantitation. Uniprot human FASTA sequences (UniprotKB Release 2015_09, 20,205 sequence entries) appended with a ClpB protein sequence (Hi3 mix, CLPB_ECOLI (P63285) was used as a database for searching. Fixed modification at cysteine (carbamidomethyl) and variable at methionine (oxidation) and trypsin as a cleaving agent (two miscleavages allowed) was specified. Default peptide and fragment error tolerances were used and the false discovery rate (FDR) was set to maximally 1%, after which 361 proteins with two or more unique peptides were identified. The software stops searching the database when the FDR reaches the specified 1%. The original number of proteins identified was 642. Progenesis QI for Proteomics applies the parsimony principle for protein grouping.

Regression modeling and correlation analysis

Feature selection in terms of which clinical covariates were important for modeling and prediction was done using two methods: relative importance and stepwise regression. For regression modeling, no PE vs. PE was selected as the response for feature selection. Continuous variables such as age, height, weight at the beginning of pregnancy, BMIBP and BMI at the end of pregnancy were used as features for the regression modeling. The R package relaimpo was used for both methods. For the correlation analysis, t-tests were used to identify proteins with significantly different (p < 0.05) abundances between the groups. The significantly different proteins as identified by t-test were then correlated to BMIBP by calculating the Pearson correlation coefficient. In total, three sets of correlations were performed. In the first set, the significantly different proteins as identified by t-test from the PE vs. HC comparison were selected and their abundances in HC samples were correlated to BMIBP. In the second set, the significantly different proteins from the PE vs. RC comparison were selected and their abundances in RC samples were correlated to BMIBP. In the third set, proteins that displayed significantly different abundances in either comparison (PE vs. HC or PE vs. RC) were chosen and their abundances in PE samples were correlated to BMIBP. Only samples from the 12–14 weeks group were used for this analysis. A Pearson correlation coefficient of 0.65 or above for a positive correlation and a coefficient of − 0.65 or below for a negative correlation with BMIBP was considered significant. Further details can be found in Supplementary Methods (found in Additional file 1).

Further analysis for the selection of candidate biomarkers

Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) was performed with EZInfo 3.0 (Umetrics, Sweden) and variable influence on projection (VIP) values for each comparison were calculated. The R Package ropls was used for calculating VIP values in partial least squares-discriminant analysis (PLS-DA) modeling for the first time point (12–14 weeks) comparisons. Nearest shrunken centroid (NSC) was used to find important proteins for the first time point comparison using the R package pamr. Training was done on data from each of the three sampling time points using the “pamr.train” function from pamr. The importance of each protein was obtained using 10-fold cross-validation. OPLS-DA modeling was performed and VIP plots produced for every pairwise comparison. Three pairwise comparisons were done for every sampling time point (12–14, 18–20, and 26–28 weeks of gestation), giving a total of nine comparisons. The three comparisons for every time point were PE vs. HC, PE vs. RC, and RC vs. HC. In the coefficients vs. VIP plot, proteins with VIP values of 1 or more were considered to be significantly different between the groups. Receiver operating characteristic (ROC) curve analysis was performed using MetaboAnalyst 4.0 (https://www.metaboanalyst.ca). The area under the curve (AUC) was calculated for individual proteins using the univariate option in MetaboAnalyst 4.0. For the combination of the six proteins subsequently chosen, the multivariate option was used. Column scatter plots were drawn using the R packages ggplot2 and ggpubr. Further details are given in Supplementary Methods (found in Additional file 1).

Results

Study design and protein identification

The samples in this study were divided into three groups: women who developed PE (PE, seven women, 21 samples), women with clinical risk factors for PE who did not develop the disease and were therefore classified as at-risk controls (RC, six women, 18 samples), and controls without known risk factors for PE (so-called healthy controls, HC, four women, 12 samples). At the time of sampling, none of the subjects had PE. We quantified a total of 361 proteins with two or more unique peptides (found in Additional file 2). The samples within each group were also divided according to sampling time points (12–14 weeks, 18–20 weeks, and 26–28 weeks of gestation), leading to a total of nine different comparisons. The study design and division of samples into groups for further analysis are presented in Fig. 1.

Fig. 1
figure 1

Flowchart showing the study design and subsequent division of samples into groups for further analysis. Samples were collected and processed, after which ultra performance liquid chromatography-ultra definition mass spectrometry (UPLC-UDMSE) was performed. The raw data was analyzed and we subsequently identified 361 proteins with two or more unique peptides. The samples were divided into groups, after which further statistical analyses were performed. First, the samples were divided into three groups: women who developed PE (“Pre-eclampsia”), women with clinical risk factors for PE who did not develop PE and were classified as at-risk controls (“Risk controls”), and controls without known risk factors for PE (“Healthy controls”). Within each group, the samples were also analyzed according to time point when the samples were obtained (12–14 weeks, 18–20 weeks, or 26–28 weeks of gestation)

Regression modeling and correlation

Regression modeling was performed and we subsequently identified BMIBP as the clinical parameter most important for the prediction of the endpoint outcome (incidence of PE). The final Akaike information criterion of the modeling was − 22.1 and the misclassification error was 0.35, indicating that 65% of all PE cases could be successfully predicted with our model using BMIBP. BMIBP was then correlated to the expression of those proteins with t-test p-values of less than 0.05 (found in Additional file 2). The results of this analysis are given in Supplementary Table 3C in Additional file 2. As can be seen, a total of 37 proteins displayed a negative correlation with BMIBP and 13 proteins displayed a positive correlation with BMIBP in the first set of correlations (see the section “Regression modeling and correlation analysis” in Methods for details). These proteins could therefore be used in combination with BMIBP to predict PE in the HC group. In the second set of correlations, between PE and RC, four proteins were identified that negatively correlated with BMIBP and a total of 67 proteins were identified that positively correlated with BMIBP. These proteins could be used in combination with BMIBP to predict PE in the RC group. In the third set of correlations, six proteins were found to negatively correlate with BMIBP and 20 proteins were found to positively correlate with BMIBP. These proteins could be used together with BMIBP to predict PE in both the HC and RC group. The top three proteins that positively correlated with BMIBP in the third set of correlations were keratin, type I cytoskeletal 24 (KRT24), matrix extracellular phosphoglycoprotein (MEPE), and CD320 antigen (CD320).

Identification of candidate biomarkers

Three different methods were used for finding candidates for biomarkers that could classify the groups PE vs. HC and RC. OPLS-DA, coefficient vs. VIP plots, PLSDA, and NSC were used for the selection of appropriate biomarker candidates. The coefficients vs. VIP plots comparing PE vs. HC and PE vs. RC when only the 12–14 weeks of gestation group was considered are shown in Fig. 2 (for VIP values for the 12–14 weeks of gestation group, see S4 Table). The coefficients vs. VIP plot comparing HC vs. RC for the 12–14 weeks of gestation group can be found in Additional file 1.

Fig. 2
figure 2

The coefficients vs. VIP plots obtained from OPLS-DA modeling. A) PE vs. HC. B) PE vs. RC. OPLS-DA modeling was performed with EZInfo 3.0 software. The x-axis shows the coefficients and the y-axis the VIP. Proteins passing the cutoff of VIP values ≥1 were considered significantly different. VIP = variable influence on projections, OPLS-DA = orthogonal projections to latent structures-discriminant analysis, PE = preeclampsia, HC = healthy controls, RC = risk controls

Proteins able to classify the groups HC vs. RC, PE vs. HC, and PE vs. RC were compared to each other via Venn diagram. As seen in the Venn diagrams in Fig. 3, the proteins (VIP > 1) in common between the comparisons PE vs. HC and PE vs. RC can differentiate those women who develop PE from HC and RC already in early pregnancy. These 35 proteins are presented as part of S4 Table. The comparisons when only the 18–20 weeks of gestation group was considered identified 18 proteins (part of S5 Table), while the comparisons when only the 26–28 weeks of gestation group was considered identified 14 proteins (part of S6 Table) that were able to classify the different groups. Six proteins, given in Table 2, were common to all comparisons and were therefore able to classify PE vs. HC and PE vs. RC at all three time points.

Fig. 3
figure 3

Venn diagrams comparing proteins with VIP values > 1. These proteins were compared between the binary comparisons PE vs. HC, PE vs. RC, and HC vs. RC. All comparisons were done separately according to sampling time point (12–14, 18–20, and 26–28 weeks of gestation). Finally, proteins overlapping between the comparisons PE vs. HC and PE vs. RC in all time points were compared to each other, leading to the identification of six proteins that were able to separate HC and RC from PE at all sampling time points. The gene names of these six proteins are shown in the center of the figure. VIP = variable influence on projections, PE = preeclampsia, HC = healthy controls, RC = risk controls

Table 2 The six proteins identified that could classify PE vs. HC and PE vs. RC at all time points

The VIP values obtained from PLS-DA of all significant proteins in the first comparison are given in S7 Table. In NSC-based variable selection, the NSC model had ~ 10% classification error calculated by 10-fold cross-validation in the PE vs. HC comparison and ~ 15% classification error in the PE vs. RC comparison. The proteins glycoprotein hormone alpha chain and prothrombin were found in the PE vs. HC comparison, while the other four were found in the PE vs. RC comparison. The complete list of variables provided by NSC is given in S8 Table.

The normalized abundances of the six proteins identified that could classify PE vs. HC and PE vs. RC at all three time points (12–14, 18–20, and 26–28 weeks of gestation) are presented in column scatter plots in Fig. 4. The abundances of these six proteins are significantly higher in women who developed PE compared to women with and without risk factors for PE that did not develop the disease.

Fig. 4
figure 4

Column scatter plots showing the normalized abundances of the six proteins presented. Samples from all three time points (12–14, 18–20, and 26–28 weeks of gestation) were included. Boxes indicate mean and standard deviation. PE = preeclampsia, RC = risk controls, HC = healthy controls

ROC curve analysis was performed for these six proteins by taking their normalized abundance values separately from the PE vs. RC and PE vs. HC comparisons from the first group (12–14 weeks). The area under the curve (AUC) values for this group are given in Table 3 for both comparisons. We also tested if it was possible to make a panel comprised of these six proteins that could classify PE vs. HC and RC. We calculated the combined AUC values of various combinations of these proteins and the highest AUC value that could be reached with a combination of all six proteins was 0.81 for distinguishing between PE vs. HC (see Additional file 1). For PE vs. RC, the highest AUC value for a combination of these six proteins was 0.847 (see Additional file 1). The AUC values for individual proteins were therefore higher, indicating that proteins including endosialin and protein AMBP alone are able to distinguish between PE vs. HC and RC, respectively.

Table 3 The AUC values for the six proteins that were able to classify PE vs. HC and RC

Discussion

In this pilot study, we used UPLC-UDMSE to comprehensively analyze the proteome of prospectively collected urine samples from pregnant women divided into three main groups: PE, RC, and HC. The samples were also divided according to three sampling time points (12–14, 18–20, and 26–28 weeks of gestation), enabling comparisons according to both risk of PE and gestational weeks. We quantified a total of 361 proteins with two or more unique peptides and used multiple methods of statistical analysis to identify candidates for early predictive markers of PE.

Using logistic regression modeling, we were able to identify BMIBP as the strongest predictor of PE among the clinical parameters analyzed. A high body mass index (BMI) is a risk factor for PE [19], although fairly weak as a stand-alone predictor [20]. Here, we identified multiple urine proteins whose abundances significantly correlated with BMIBP that are candidates for new predictive markers of PE. In the third set of correlations in this study, proteins that displayed significantly different abundances in either of the first two comparisons (PE vs. HC or PE vs. RC) were chosen and their abundances in PE samples correlated to BMIBP. The proteins that were subsequently found to have a significant positive correlation with BMIBP, including KRT24, MEPE, and CD320, are likely of particular value as predictive biomarkers, as they may be able to predict PE in both healthy women and women at risk of PE. While these proteins may not be able to classify PE vs. RC and/or HC on their own, they may be of value as predictive markers when used in combination with BMIBP to predict PE. This is something that would be of interest to investigate in future studies. The correlation of urine proteins identified by proteomic analysis with BMIBP for the prediction of PE has, to the best of our knowledge, not been performed before. Only samples from 12 to 14 weeks of gestation were used for the correlation analysis, as proteins whose levels change later during pregnancy may not be as valuable for predicting PE at an early stage. The urine proteins identified here are worth investigating further, as they could be used in combination with BMIBP to predict PE at an early stage.

Using OPLS-DA, PLS-DA, and NSC modeling, we subsequently identified six urine proteins (Table 1) that are strong candidates for new predictive biomarkers of PE. These six proteins were found to have significantly different levels between the PE group and the HC and RC groups using all three methods. These proteins were able to differentiate between women who subsequently develop PE from those who do not at all three time points. The AUC values obtained from ROC analysis ranged between 0.833 and 0.964, indicating that these six proteins are excellent classifiers for distinguishing between PE vs. HC and RC already at 12–14 weeks of gestation. Endosialin displayed an AUC value of 0.964 for distinguishing between PE and HC, while protein AMBP displayed an AUC value of 0.929 for distinguishing between PE and RC, indicating their potential as predictive markers of PE. These six proteins were able to classify PE vs. HC and PE vs. RC already at 12–14 weeks of gestation (as well as at 18–20 weeks and 26–28 weeks of gestation) and are therefore of great value for predicting PE during early pregnancy (as well as during mid- and late pregnancy). As can be seen in Fig. 4, the normalized abundances of these six proteins are significantly higher in women who developed PE compared to women with and without risk factors for PE that did not develop the disease when samples from all time points were analyzed. This indicates their importance as new predictive markers for PE, although further validation of these proteins is still necessary.

One of the six proteins identified here, namely protein AMBP, has been previously identified as a potential biomarker of PE [21,22,23]. Prothrombin, another of the six proteins identified, is involved in the clotting process. Coagulation parameters have been shown to be altered in women who develop PE [24], indicating a possible role for prothrombin in PE and further suggesting that it could be of value as a predictive marker of PE. The identification of proteins previously linked to PE serves as a form of validation for our pilot study and strengthens our findings.

A recent proteomic study compared urine samples from women with PE and controls and subsequently identified several proteins that were differentially expressed between the groups. However, this study did not determine if changes in the levels of these proteins could be seen early during pregnancy [25]. Another recent study analyzed serum samples from women at 10–12 weeks of gestation, of which half later developed PE. The authors identified nine serum proteins with significantly different levels between the groups that could be of value as markers for early PE [26]. However, serum samples are more invasive to obtain than urine samples. The findings of our study are of significant value as we have identified six urine proteins that are independently able to classify women who developed early-onset PE from both at-risk women and controls without risk factors for PE during early, mid- and late pregnancy.

The inclusion of women who developed early-onset PE in this sub-study was due to the importance of identifying predictive markers for early-onset disease. In this study, six of the seven women in the PE group eventually developed severe PE. While this may indicate that severe PE is overrepresented here, it also shows that for most women who develop early-onset PE, the disease is severe. Further, there are several criteria for the diagnosis of severe PE, including oliguria, seizures, and persistent severe central nervous system symptoms. The presence of any one of these will suffice for the diagnosis of severe PE [27]. As PE is a multi-organ progressive disorder, many patients will therefore eventually meet the criteria for severe PE. While this study was limited by the small sample size, it was strengthened by the multiple statistical methods used for the identification of candidate biomarkers, which increases the confidence of the findings. It was also strengthened by the use of non-invasive urine samples, which are easy to obtain. Due to the inclusion of women who subsequently developed early-onset PE in this study, it is possible that our findings may not be generalized to late-onset PE as well. However, it is important to find ways to predict early-onset PE, as the consequences are often severe.

Conclusions

In this study, we have used mass spectrometry and rigorous statistical analyses to analyze the urine proteome of pregnant women at varying risk of PE. We have identified multiple proteins that could be of value as predictive markers of PE in combination with BMIBP, as well as six proteins that were identified using OPLS-DA, PLS-DA, and NSC modeling that are strong candidates for predictive markers for PE. While validation of these candidates is still needed, our pilot study indicates that comparative urine proteomics is valuable for the identification of new, non-invasive markers for the early prediction of PE.