Introduction

Osteoarthritis (OA), the most common musculoskeletal disorder, is a multifactorial disease irreversibly affecting several joint tissues, the knee being the most prevalent [1]. OA is a major cause of pain, disability, and comorbidities, and about 30% of the worldwide population aged 50 years and older suffer from this disease [2, 3]. OA progression is influenced by numerous factors including age, gender, obesity (major risk factors), and inflammatory mediators, to name a few.

At present, there are no treatments to cure this disease; the current ones only target symptomatic relief. This is related, in part, to the inability to diagnose OA at an early stage, as the existing methods are not sensitive enough. Early and specific OA diagnosis would allow early and targeted treatments/interventions to prevent or delay not only the progression of the disease but also surgery such as joint replacement. This would result in less pain and a better quality of life for patients, in addition to reducing the substantial societal economic burden [4,5,6].

Because the alteration of the articular tissues develops over a few years, the identification of specific molecules/biomarkers that would enable OA early determination is proving to be a challenging task. To date, there are no regulatory agency-approved biomarkers, as none has yet reached the required specificity, sensitivity, and reliability.

Over the years, several approaches, such as genomics, antibody signature, and metabolomics, have been used to identify biochemical and physiological aspects of OA [7,8,9,10]. Another interesting avenue in the search for biomarkers is proteomics. Compared to metabolomics and genomics, the proteomic approach has the advantage of reflecting the patient’s condition at a specific time as well as being more stable than metabolites.

Proteomics using liquid chromatography–tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in a single analysis using a relatively small sample amount, which is ideal for the high throughput analysis of a high dynamic range sample such as serum [11]. Such a proteomic approach has been used to identify specific diagnostic markers of many pathologies such as cancer, cardiovascular, liver, and kidney diseases [12,13,14,15,16], as well as some arthritic diseases [8, 17,18,19,20,21,22,23,24], to name a few. LC-MS/MS has been used to monitor the individual proteomes of healthy or OA joint tissues (cartilage, meniscus, synovial membrane), cells (chondrocyte, synoviocyte), and fluids (serum/plasma, synovial fluid, urine) [19, 25,26,27,28,29]. Several proteins that may relate to OA pathological mechanisms have been found but, as mentioned above no molecule has been validated as a specific marker for OA patients, not to mention the early stages of this disease. This could be due in part to the non-specificity of the molecules, which is related more to pathological conditions other than OA, including obesity [30,31,32,33,34].

Therefore, there is an urgent need to identify novel and specific biomarkers that will prove to be both efficient and sensitive enough to be used for OA early diagnosis. The objective of this study was to identify, with the use of LC-MS/MS, novel OA-specific serum biomarkers.

Material and methods

Study participants

Participants were selected from the control and progressor subcohorts of the Osteoarthritis Initiative (OAI) database. The individuals in the progressor cohort had symptomatic radiographic OA as described (https://oai.nih.gov). Serum samples were from 8 controls and 20 OA, the latter equally divided into OA-obese (n=10; body mass index ≥30 kg/m2) and OA-non-obese (n=10; BMI <30 kg/m2).

For validation purposes, fasting plasma samples were derived from the Newfoundland and Labrador cohort in which the controls were from the Complex Diseases in Newfoundland population: Environment and Genetics (CODING) [35] and the OA samples from the Newfoundland Osteoarthritis Study (NFOAS; https://www.med.mun.ca/NFOAS/Home.aspx) [36]. Plasma samples were from 20 controls and 20 OA, the latter equally divided into OA-obese (n=10) and OA-non-obese (n=10).

The characteristics of the selected individuals are listed in Table 1 (OAI) and Table 2 (CODING and NFOAS). For the OAI, the demographic, clinical, and radiographic data were obtained from the OAI database (https://oai.nih.gov).

Table 1 Osteoarthritis Initiative (OAI) participant characteristics
Table 2 Complex Diseases in Newfoundland population: Environment and Genetics (CODING) (control) and the Newfoundland Osteoarthritis Study (NFOAS) (OA) participant characteristics

All participants had provided written informed consent for their participation. For the OAI cohort, the ethics approval was obtained by each of the OAI clinical sites (University of Maryland Baltimore Institutional Review Board, Ohio State University’s Biomedical Sciences Institutional Review Board, University of Pittsburgh Institutional Review Board, and Memorial Hospital of Rhode Island Institutional Review Board) and the OAI coordinating center (Committee on Human Research at the University of California, San Francisco, CA, USA). For the CODING and NFOAS cohorts, the ethics approval was obtained from the Health Research Ethics Board of Newfoundland and Labrador.

The Institutional Ethics Committee Board of the University of Montreal Hospital Research Centre approved the use of the human serum/plasma.

Serum/plasma samples

Serum/plasma samples were obtained from the OAI (refer to the OAI operations manual detailing specimen collection and processing methods [https://oai.nih.gov]) and the CODING/NFOAS, as previously described [36, 37]. The specimens were collected after an overnight fast using a uniform protocol. For the plasma, blood was collected and plasma separated from the red cells immediately after collection by centrifugation (20,000 rpm for 10 min). Upon reception, samples for both cohorts were aliquoted, stored frozen at −80°C, and thawed at 4°C just before use.

Mass spectrometry

Preparation of serum samples

Data for the samples (non-depleted and depleted) were both acquired in Data Dependent Acquisition mode and analyzed with MaxQuant software, version 1.6.7 [38], as previously described [39].

The non-depleted samples were randomized before analysis. One microliter of each serum sample was diluted in 24 μl of sodium deoxycholate (SDC) buffer consisting of 1% deoxycholate/10 mM Tris (2-carboxyethyl)phosphine/40 mM chloroacetamide/100 mM Tris pH 8.5, heated for 10 min at 95°C, followed by treatment with a mixture of trypsin and Lys-C (Promega, Madison, WI, USA) (0.66 μg of each enzyme) for 1 h at 37°C. The digestion was stopped with 5 μl 50% formic acid causing the precipitation of deoxycholate. The samples were then centrifuged at 16,000g for 15 min at 4°C.

The peptides contained in the supernatant were purified on StageTips C18 Empore (3M, St-Paul, MN, USA) according to Rappsilber et al. [40]. Finally, the peptides were vacuum dried and stored at −20°C prior to mass spectrometry analysis.

High-abundance protein depletion for building a matching library

To improve the number of peptides/protein identification, in the final analysis, a matching library was prepared for its use with the MaxQuant software, as described by Geyer et al. [41]. This used a depleted serum. By adding a library of depleted serum in the analysis, this strategy took advantage of the “match between runs” function of the MaxQuant software, where peptides identified by MS/MS in the library can be matched to the non-depleted samples to recover their quantification even without MS/MS. This library was obtained by pooling 2 μl of each patient’s serum sample, which was then depleted for high abundance proteins using the Seppro IgY14 Spin Column kit according to the manufacturer protocol (Sigma-Aldrich, St Louis, MO, USA). The flow-through was collected, and the proteins were precipitated with the addition of 5 volumes of ice-cold acetone and incubated overnight at −20°C. After centrifugation at 10,000g for 10 min, the pellet was resuspended by 120 μl of SDC buffer and heated at 95°C for 10 min. After cool down, the pooled samples were digested with 1:100 Trypsin:proteins and 1:100 Lys-C:proteins ratios according to a Bradford protein assay. The resulting peptides were purified on Oasis HLB Cartridge (Waters) according to the manufacturer’s procedure. The peptides were then fractionated on a high pH reversed-phase peptide chromatography according to Yang et al. [42]. The 12 resulting fractions were vacuum dried and stored at −20°C prior to mass spectrometry analysis.

Liquid chromatography (LC)-MSMS analysis

Both non-depleted samples and fractions of the depleted pool were analyzed, as previously described [43]. In brief, samples or fractions were resuspended with 30 μl 2% acetonitrile/0.05% trifluoroacetic acid. Protein concentration was determined at 205 nm using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Waltham, MA, USA); the protein concentration was adjusted to 0.2 μg/μl. Five microliters of the resuspended peptide digestion (equivalent to 1 μg peptides) was injected on a nanoflow liquid chromatography/MSMS (nanoflow LC-tandem MS). The experiments were performed with a Dionex UltiMate 3000 nanoRSLC chromatography system (Thermo Fisher Scientific/Dionex Softron GmbH, Germering, Germany) connected to an Orbitrap Fusion Tribrid ETD mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) equipped with a nano electrospray ion source. Peptides were trapped at 20 μl/min in a loading solvent (2% acetonitrile, 0.05% trifluoroacetic acid [44]) on a 5-mm length 300 μm Internal Diameter (I.D.), 5 μm particles Acclaim™ PepMap™ 100 pre-column cartridge (Thermo Fisher Scientific/Dionex Softron GmbH) for 5 minutes. Then, the pre-column was switched online with 500-mm length, 75 μm I.D., 3 μm particles, Acclaim™ PepMap™ 100 C18 analytical column (Thermo Fisher Scientific/Dionex Softron GmbH), and the peptides were eluted with a linear gradient from 5 to 40% (A: 0,1% formic acid, B: 80% acetonitrile, 0.1% formic acid) for 90 min, at 300 nl/min. Mass spectra were acquired using a Data Dependent Acquisition mode (Thermo XCalibur software, version 4.3). Full scan mass spectra (350 to 1800 m/z) were acquired in the orbitrap using an automatic gain control (AGC) target of 4e5, a maximum injection time of 50 ms, and a resolution of 120,000. Internal calibration using lock mass on the m/z 445.12003 siloxane ion was used. Each MS scan was followed by the acquisition of fragmentation MSMS spectra of the most intense ions for a total cycle time of 3 s (highest speed mode). The selected ions were isolated using the quadrupole analyzer in a window of 1.6 m/z and fragmented by higher energy collision-induced dissociation (HCD) with 35% of collision energy. The resulting fragments were detected by the linear ion trap at a rapid scan rate with an AGC target of 1e4 and a maximum injection time of 50 MS. Dynamic exclusion of previously fragmented peptides was set for a period of 20 s and a tolerance of 10 ppm.

Database searching and label-free quantification

Spectra were searched against a human proteins database (Uniprot Homo sapiens Reference Proteome – UP000005640 – 74435 entries - 21.04.2019) using the Andromeda module of the MaxQuant software [39]. In brief, the trypsin/P enzyme parameter was selected with two possible missed cleavages. Carbamidomethylation of cysteines was set as a fixed modification, methionine oxidation, and deamidation of glutamine and asparagine as variable modifications. Mass search tolerances were 5 ppm and 0.5 Dalton for MS and MS/MS, respectively. For protein validation, a maximum false discovery rate of 1% at peptide and protein levels was used based on a target/decoy search. MaxQuant was also used for label-free quantification with a minimum ratio count of 1. The “match between runs” algorithm was used with 20 min as alignment time window and 0.7 min as match time window values to enable a peptide MS1 signal match between the matching library consistent with fractions of depleted samples and the non-depleted serum samples. Only unique and razor peptides were used for quantification. All other parameters were set at default values.

Protein assays

Proteins tested were the fibrillin-1 (FBN1), Vitamin D-binding protein (VDBP), and SERPINF1. They were determined with specific assays according to manufacturer’s specifications. FBN1 was quantitated by ELISA (dilution 1:5; #MBS3804755, MyBiosource, San Diego, CA, USA), VDBP with a Multiplex assay (dilution 1:10000; #HCCBP2MAG-58K, EMD Millipore Corporation, Billerica, MA, USA), and SERPINF1, by Luminex assay (dilution 1:4000; #LXSAHM-01, R&D systems, Minneapolis, MN, USA). Protein quantification was performed using the LiquiChip 200 apparatus, and the data analysis performed with ht LiquiChip Analyzer software (Qiagen, Toronto, ON, Canada). For each biomarker, an 8-point standard curve and appropriate controls were included, and samples were done in duplicate. The minimum detectable doses were for FBN1, 0.312 ng/ml; VDBP, 0.58 ng/ml; and SERPINF1, 3.66 pg/ml.

Data treatment and statistical analysis

The proteinGroups.txt file generated by MaxQuant was used in R software, version 3.4 [45]. The intensity values of each peptide in each non-depleted serum sample were normalized using the median of all intensity values in each sample (normalization by column). For each comparison, only peptides having at least 60% of non-missing values across all the non-depleted samples were considered as quantifiable. Missing values remaining after this filtering were imputed using a noise value calculated as the first centile of all intensity values per sample (calculation per column), as previously described [46]. Only proteins with at least two quantified peptides were kept for further analysis.

For the analysis of differential expression between two groups, a protein ratio was calculated using the average of protein intensities in all samples of the same group. These ratios were then converted into z-score (z = (x-μ)/σ where x =log2(ratio); μ = average of all log2(ratios); σ =standard deviation of all log2(ratios)) for data centering. Statistical analysis was performed using the Limma Bioconductor package [47] to define the probability of variation (p-value) of each protein between two groups. This method has been preferred to the usual Student t-test as it has been shown to be less sensitive to the number of biological replicates. This was followed by the Benjamini-Hochberg method to adjust for multiple comparison (q-value). Proteins with a q-value < 0.050 and absolute value of z-score > 1.96 were considered significantly different.

Further, two multivariate methods were used through the MixOmics R package [48]. First, to compare the proteomic profiles, the multivariate unsupervised principal component analysis (PCA) [49] followed by the pairwise comparison were used. PCA method enables to cluster the samples by reducing the dimension of expression data with minimum information loss and visualize the similarities between the proteins. It is a logistic regression that provide a relative weighting of the protein importance. Second, to select the most predictive/discriminative features, the supervised classification model sparse partial least squares regression discriminant analysis (sPLS-DA) [50] was used. This method is a linear classification model enabling discriminative variable selection that could predict the outcome. It allows to seek for components that best separate the samples. Moreover, this method presented a graphical representation of the components and proteins assisting for the interpretation of the results. The number of components and variables was defined after a tuning step to optimize the distinction between the three groups (control, OA-obese, OA-non-obese).

For the validation experiments, the differences between groups were assessed using the Student t-test. A value of p≤0.050 was considered statistically significant. Statistical analysis was performed using the GraphPad Prism 8 (San Diego, CA, USA).

Results

Subject characteristics

Table 1 shows the characteristics of the participants from the OAI cohort comparing control, OA, OA-obese, and OA-non-obese individuals. The obese/non-obese division was performed in an attempt to discriminate proteins not specific to OA but to obesity. Compared to controls, OA patients were older (p=0.037) and had higher BMI (p=0.011), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) scores (p≤0.0003), Kellgren-Lawrence grades (p<0.0001), and smaller medial joint space width (p<0.0001). Comparison between OA-obese with OA-non-obese showed only, and as expected, that the former had a higher BMI (p=0.047). When each two OA subgroups were compared to control, data were comparable to the total OA group, but OA-non-obese were slightly older (p=0.023) and had a similar BMI.

Table 2 shows that none of the participant characteristics differed between the CODING (controls) and NFOAS (OA) cohorts. Compared to the OAI controls, the CODING participants were older (p<0.0001) and had a higher BMI (p=0.002), and OA participants from the NFOAS had higher WOMAC scores (p<0.0001) than those from the OAI.

Quantitative proteomic analysis

Principal component analysis (PCA)

A shotgun proteomic analysis was performed on the non-depleted individual serum samples. Five hundred and nine (509) proteins could be identified in at least one individual sample. As mentioned above, in addition to the non-depleted, we added a depleted serum library for the database searching and quantification. Such an addition boosted the protein identification by 28%. Two hundred and seventy-nine (279) proteins (Table S1) were quantified after filtering for proteins having at least 60% of non-missing values in at least one of the two compared conditions and having two quantified peptides or more to retain only high-quality protein measurements. This data was used to explore the global proteomic profile of each sample and group through a PCA analysis. This unsupervised multivariate method (Fig. 1) generates principal component axes that best explain the variability in the data without knowing the group of the sample. The data showed that the three groups (control, OA-obese, OA-non-obese) could not be clearly distinguished based on their global proteomic profile suggesting that the differences between the groups might be from low variations in protein expression and/or variations on a small number of protein species.

Fig. 1
figure 1

Principal component analysis (PCA) (unsupervised). Serum samples were from controls (n=8; + [gray]), osteoarthritis (OA)-obese (n=10; ○ [blue]), and OA-non obese (n=10; Δ [orange]) and analyzed by PCA. PCA represents the maximum variability that exists between different samples and unsupervised means regardless of the group to which they belong. The closer the points are in the PCA, the closer the proteomic profiles of the corresponding samples. In each axis, the percentage represents the total variability between all points. PC, principal component

Pairwise differential expression analysis

To unveil the small differences between the groups, pairwise differential analyses were performed using the protein quantitative value. Comparisons were made between control with OA-non-obese and OA-obese as well as between OA-obese with OA-non-obese. For each comparison, protein ratios were calculated between the two groups and converted into z-score for data centering. Statistical analysis was performed with the Limma method. Table S2 lists the normalized intensity values, means, ratios, and z-scores for the 12 proteins that were found significantly differentially expressed in at least one of the three pairwise comparisons, and Table 3 summarizes the data. Of note, differential expressions could not be performed for FBN1, comparing OA-obese with controls, and for lysine-specific demethylase4C/4E/4B (KDM4C/4B/4E), comparing OA-non-obese with controls, as these proteins could not be quantitated accurately due to missing values in OA-obese and OA-non-obese groups, respectively (Tables S2 and Table 3).

Table 3 Principal component analysis-pairwise differential expression

For these 12 proteins, pairwise comparison revealed that 8 were differentially regulated between OA-obese with controls, and also 8 between OA-non-obese with controls; some proteins being common to both comparisons (Table 3, Fig. 2A, B). No protein was found differentially regulated between the two OA subgroups (Fig. 2C). One may also note that, in Fig. 2A, B, the ratio distribution is not centered when OA obese and OA non-obese are compared to control. In the latter, there are slightly less quantified proteins; however, the overall intensity is somewhat strong. Although this cannot be explained at present, to overcome this issue, we centered the data by calculating a z-score and considered proteins as regulated or not between two conditions based on both their q-value and z-score.

Fig. 2
figure 2

Principal component analysis (PCA)-pairwise differential expression. Volcano and box plots of statistically differently regulated proteins (for the volcano plot, red when osteoarthritis (OA) was lower than controls and blue when OA was higher than controls between A OA-obese (n=10, OA-ob. [hatched]) and controls (n=8, CTL [white]), B OA-non obese (n=10, OA-non ob. [hatched]) and controls (n=8, CTL [white]) and C OA-obese (n=10) and OA-non obese (n=10)). The intensity values for each protein and sample were obtained from the mass spectrometry analysis (refer to Table S2) and transformed as Log2. Statistical analysis used the Limma method, and q<0.050 was considered statistically significant; a strong trend toward statistical difference is indicated in italic. ACT, actins; ADIPOQ, adiponectin; CRP, C-reactive protein; CRTAC1, cartilage acidic protein 1; FBN, fibrillin 1; IGHV3-35, Ig heavy variable 3-35; KDM4C/4B/4E, lysine-specific demethylase4C/4E/4B; KHSRP, KH-Type Splicing Regulatory Protein; LYZ, lysozyme; PTGDS, prostaglandin-H2 D-isomerase; S100A9: S100 Calcium Binding Protein A9

Compared to controls, data revealed that in the OA-obese group (Fig. 2A, Table 3), CRP, CRTAC1, LYZ, PTGDS, IGHD, and KDM4C/4B/4E were all upregulated, whereas ACTA1/ACTC1/ACTG2/ACTA2 and ADIPOQ were downregulated. In the OA-non-obese/control comparison (Fig. 2B), CRP, CRTAC1, LYZ, PTGDS, FBN1, IGHV3-35, KHSRP, and S100A9 were all upregulated in the OA-non-obese; PTGDS was included in the upregulated proteins as the q-value (q=0.054) showed a strong trend towards significance.

Sparse partial least squares regression discriminant analysis (sPLS-DA)

The pairwise comparison between the two OA subgroups did not reveal proteins that were significantly different and that could be related to the obesity condition. To mine deeper into the data and unveil proteins related to obesity, not specific necessarily to OA, we performed another multivariate analysis, the sPLS-DA. This supervised analysis enabled the selection of the most discriminative proteins in the data to classify the samples [50].

Data revealed that a very good classification (area under the curve [AUC] >95%) was obtained with two components. Component 1 (9 proteins; Fig. 3) comprised proteins discriminating the two OA groups from the controls, and component 2 (23 proteins; Fig. 4) discriminated the OA-non-obese from the OA-obese. In a given component, each protein does contribute in combination but not equally to the discrimination process, i.e., when a protein is removed from a component, the discriminatory strength of the component is altered.

Fig. 3
figure 3

Sparse partial least squares discriminant analysis (sPLS-DA) contribution to component 1. Serum samples were from controls (n=8), osteoarthritis (OA)-obese (n=10) and OA-non obese n=10) and analyzed by sPLS-DA. A The dot plot component 1 vs. component 2 allowed for the identification of component 1 discriminating both the OA-obese (o [blue]) and OA-non obese (Δ [orange]) groups from controls (+ [gray]). B Contribution of the 9 proteins in component 1; the plots display the loading weight and indicate the class (OA-obese [blue]; OA-non obese,[orange]) for which the selected protein has a maximal mean value; the negative value indicates contributions higher in the OA compared to the control group. C Box plots of each protein comprised in component 1. The intensity values for each protein and each sample were obtained from the mass spectrometry analysis, and the mean of the intensity values was calculated for each protein and transformed as Log2. Statistical analysis used the Limma method, and q<0.050 was considered statistically different. OA-obese (OA-ob. [hatched left]), OA-non obese (OA-non ob. [hatched right], and control (CTL [white]). CRTAC1, cartilage acidic protein 1; GC, vitamin D binding protein; C1R, complement C1r subcomponent; Serpin F1, pigment epithelium-derived factor; PROS1, vitamin K-dependent protein S; SEPP1, selenoprotein P; C1QC, complement C1q subcomponent subunit C; ITIH4, inter-alpha-trypsin inhibitor heavy chain 4; APCS, serum amyloid P

Fig. 4
figure 4

Sparse partial least squares discriminant analysis (sPLS-DA) contribution to component 2. Serum samples were obtained from osteoarthritis (OA)-obese (n=10) and OA-non obese (n=10) individuals and analyzed by sPLS-DA. Contribution of the 23 proteins in component 2; the plots display the loading weight and indicate the class (OA-obese, [blue]; OA-non obese, [orange]; control, [gray]) for which the selected protein has a maximal mean value. The negative value indicates contributions higher in the OA-obese compared to the OA-non obese group, and positive number higher values in the OA-non obese. ADIPOQ, adiponectin; APOA1, apolipoprotein A-I; APOC1, apolipoprotein C-I; APOL1, apolipoprotein L1; DBH, dopamine beta hydroxylase; F12, coagulation factor XII; GP1BA, platelet glycoprotein Ib alpha chain; GPX3, glutathione peroxidase 3; HPR, haptoglobin-related protein; IGHD, Ig delta chain C region; IGFALS, insulin-like growth factor-binding protein complex acid labile subunit; IGLV2-14, Ig lambda chain V-II region TOG; IGLV4-69, Ig lambda variable 4-69; IGKV2-24, Ig kappa variable 2-24; IGLV2-23, Ig lambda chain V-II region NEI; IGKV3-15, Ig kappa chain V-III region POM; IGLV3-21, Ig lambda chain V-III region LOI; IGHV3OR16-12, Ig Heavy Variable 3/OR16-12 (Non-Functional); PON1, serum paraoxonase/arylesterase 1; PROC, vitamin K-dependent protein C; SERPINA6, corticosteroid-binding globulin; SERPINA3, alpha-1-antichymotrypsin; SERPINC1, antithrombin-III

Figure 3A illustrates a clear separation of the control group from the two OA subgroups, which is particularly visible in component 1. Figure 3B shows the contribution of each of the 9 proteins comprised in component 1 listed by order of importance—CRTAC1, GC, C1R, SERPINF1, PROS1, SEPP1, C1QC, ITIH4, and APCS. Of note, CRTAC1, which was found to contribute the most, was also identified previously in the pairwise analysis as upregulated in both OA-obese and OA-non-obese compared to controls (Fig. 2, Table 3). Figure 3C shows the intensities of the 9 proteins contributing to component 1 for each group and their comparisons between the groups. Compared to controls, both OA groups were upregulated for all 9 proteins and statistical difference was reached for all in the OA-non-obese. Although values of both OA-obese and OA-non-obese were relatively similar for all the 9 proteins, comparison between the OA-obese with controls showed that the proteins PROS1 and SEPP1 did not reach statistical difference.

Component 2 is a group of 23 proteins that discriminates OA-obese from OA-non-obese. Figure 4 shows the contribution value of each of these proteins. Importantly, none of the 23 proteins found in component 1, which discriminates OA from controls, and only the protein ADIPOQ (with a very low contribution) were previously identified in pairwise comparison as down-regulated in OA-obese compared to controls (Fig. 2A, Table 3 and Table S2).

Some of the proteins of component 2 were involved in the coagulation/fibrinolysis pathways or lipid metabolism. Also listed are some immunoglobulins, mostly light chains (lambda and kappa variable). Regarding the contribution of each protein to component 2, ApoC1 and SERPINC1 were the proteins with the strongest contribution in the OA-non-obese group, while HPR, IGKV3-15, and APOL1 led in the OA-obese group.

Protein validation

To complement this work, comparison of three proteins (FBN1, VDPB and SERPINF1) using plasma from another cohort (CODING and NFOAS) was performed between controls and OA. Data showed that statistical difference was reached when OA was compared to controls for FBN1 (p=0.044), and VDPB (p=0.022), and a trend toward significance for SERPINF1 (p=0.064) (Fig.S1). Of note, no difference was obtained when OA-obese and OA-non-obese were compared for all the three proteins studied (p=0.656, p=0.104, and p=0.315, respectively) (Fig. S1), suggesting that these proteins are not likely obesity-regulated.

Discussion

The search for a reliable biomarker in OA is an active field of investigation. Our study identified the proteins CRTAC1, FBN-1, VDBP, and possibly SERPINF1 as potential new and OA-specific serum biomarkers.

To gather the most information about the proteomic analysis performed on our serum samples, we first assessed their proteomic profile through an unsupervised PCA analysis, then two methodologies were used to recover the most discriminative proteins involved in each group: a pairwise differential analysis based on the Limma (Student derived) statistical test and a supervised sPLS-DA analysis. The latter enabled us to find proteins discriminating OA-obese from OA-non-obese groups, which was not possible with the pairwise comparisons.

The PCA data revealed that controls can be partially discriminated from OA patients based on their global proteomic profile, while OA-obese and OA-non-obese patients cannot be differentiated. This was confirmed by pairwise differential expression analyses, which revealed that CRP, LYZ, CRTAC1, and PTGDS were all upregulated in OA individuals compared to controls. These proteins are not likely obesity-regulated as they were significantly higher than the controls in both OA-obese and OA-non-obese in addition to not being found differently regulated in the sPLS-DA component 2, which evaluates proteins between the two OA groups.

CRTAC1 appears to be a strong OA biomarker candidate as it is the only protein identified in both pairwise (increased intensity levels in OA compared to controls) and sPLS-DA (highest contribution in component 1) analyses. However, very little is known about this protein and its role, not only in OA but also in normal human physiology. Two splice variants of this gene have been reported and, in regard to articular tissues, the CRTAC1-A being the predominant form in cartilage [51]. In OA knees, studies have reported that it is a glycosylated extracellular molecule found in the inter-territorial matrix of the deep zone of the cartilage as well as in synovial fluid and serum [21, 26, 51]. It is upregulated in late-stage OA cartilage compared to healthy or early OA cartilage [52, 53]. While preparing the present work, a proteomic study done on an Icelandic population (n=39,155 including 12,178 OA) corroborates our finding that CRTAC1 was the most strongly associated (among 4792 proteins studied) to OA diagnosis and progression to joint replacement [54]. It asserts that CRTAC1 is a strong and promising biomarker candidate for OA.

FBN1 is an extracellular matrix protein that assembles into microfibrils to form the template for elastic fiber formation. In the pairwise analysis, data showed its upregulation in OA-non-obese compared to controls. In this analysis, unfortunately this protein could not be assessed in the OA-obese as it had too many missing values to assign a final score. However, in the sPLS-DA component 2, this protein did not discriminate OA-non-obese and OA-obese. Complementary experiments confirm the statistical difference of this protein between OA with controls, in addition no difference was found between OA-non-obese and OA-obese, thus not likely regulated by obesity factors. FBN1 was previously identified in the synovial fluids of OA patients, but no comparison with controls was done [55]. There are three isoforms of FBN, FBN1 being the most abundant in adult tissues [56]. Related to OA, FBN1 has been reported to sequester a key factor involved in the disease’s cartilage and bone, the latent TGF-β1 complex, regulating its bioavailability [57,58,59]. In addition, FBN1 was found associated with two other musculoskeletal diseases, systemic sclerosis and Marfan syndrome [60,61,62,63]. FBN1 would be an interesting molecule for further analysis as a potential OA biomarker.

Other proteins were found upregulated in OA compared to controls, CRP, LYZ, and PTGDS. However, they alone would not be suitable choices as specific OA biomarkers due to their rather non-specific role (CRP, a general marker of inflammation [64, 65]; LYZ, an antibacterial role or their strong link to other pathological conditions (PTGDS) [66,67,68,69,70]. Nonetheless, it is worth mentioning that a ratio of serum CRP with another molecule (monocyte chemoattractant protein-1 [MCP-1]) was suggested as an OA biomarker. This ratio has been found associated with OA symptoms and predicted, in combination with other factors, OA individuals with knee structural degenerative progression [37, 71]. Furthermore, CRP is also known to activate the classical complement pathway by binding to C1q [72]. Although we did not identify C1q in the PCA-pairwise analysis, it was found as a contributor to component 1 in the sPLS-DA analysis.

Several other proteins showed differential regulation in pairwise analysis, but are likely obesity-related, and thus not specific to all OA population. These included IGHD and KDM4C, which were upregulated in OA-obese, and KHSRP, S100A9, and IGHV3-35 were so in OA-non obese, whereas ADIPOQ and ACT were downregulated only in OA-obese.

The sPLS-DA complemented the differential expression findings and further identified proteins that discriminated both OA-obese and OA-non-obese from controls (component 1), as well as OA-obese from OA-non-obese individuals (component 2). This analysis offers an insight into which proteins contribute and how important the contribution of each is towards the discrimination of given groups.

Several proteins comprising component 1 (OA vs. controls) are molecules for which there are few or no reports as to their association with OA, as such offering novel potential candidates for OA biomarker research. The sPLS-DA revealed that the abovementioned CRTAC1 protein contributed the most towards the discrimination of OA and controls. The second contributor being VDBP and validation experiments demonstrated a significant difference between OA and controls and, as for FBN1, not between the OA subgroups. This is a multifunctional protein that not only binds to vitamin D but also has several other different physiological functions such as actin scavenging, binding of fatty acids, and chemotaxis [73]. There has been only one OA study showing increased levels of VDBP and vitamin D receptors in muscles from patients with end-stage knee OA compared to controls [74]. As knee muscles are gaining great interest regarding their impact on OA progression, this protein should be studied further as an OA biomarker.

Two other proteins in component 1, C1R and C1QC, are directly involved in the first step of the classical complement cascade. Of note, the contribution of C1QC is from the OA-obese individuals, thus probably related to obesity. C1 proteases can also cleave non-complement proteins including the LDL receptor-related protein 6, IGFBP5, and nucleolin [75]. The presence of complement proteins in this list was not unexpected, as previous studies reported the activation of the complement cascade in OA [37, 76, 77]. As complement proteins are activated in various diseases as well as in general inflammation processes, the abovementioned proteins would therefore not be very useful as specific OA markers. It has previously been reported that one of the complement proteins, as for the CRP, when employed in ratio with another molecule could be of use as a biomarker for OA cartilage degradation in OA-obese individuals. Hence, the adipokine adipsin, a component of the alternative complement pathway, when combined as a ratio with MCP-1 was found strongly associated with knee cartilage volume loss in OA-obese individuals [37].

SERPINF1, as its name indicates, belongs to the serpin family, but does not display the serine protease inhibitory activity shown by many of its family members. The SERPINF1 gene codes for the pigment epithelium-derived factor (PEDF), which was found to exacerbate mice joint cartilage damage in an in vivo inflammatory joint destruction model (monosodium iodoacetate) [78]. However, PEDF production in the joint is somewhat controversial as it was found upregulated in human OA cartilage in two studies [78, 79], while another showed no expression in articular chondrocytes but an up-regulation in osteophytic chondrocytes [80]. In regard to a musculoskeletal disease, the heritable disorder osteogenesis imperfecta, characterized by bone fragility and low bone mass, is caused by mutations in the SERPINF1 gene [81, 82]. Validation experiments showed that there was a numerical trend toward significance when OA was compared to controls. However, this protein needs more support as a potential OA biomarker and further analysis is suggested.

The other less-contributing proteins in the sPLS-DA component 1 included PROS, a vitamin K-dependent plasma protein that functions as a cofactor for the anticoagulant protease (activated protein C) in the degradation of coagulation factors Va and VIIIa; SEPP1, a selenoprotein implicated as an extracellular antioxidant, and in the transport of selenium to extra-hepatic tissues; ITIH4, a member of the serine protease inhibitor family with diverse functions such as a matrix-stabilizing molecule [83]; and APCS (amyloid P component serum), a glycoprotein capable of binding to apoptotic cells at an early stage and associated with the innate immune system. As for the C1QC, the contribution of APCS in component 1 is from the OA-obese individuals, thus probably related to obesity. Although all these proteins were not specifically studied with respect to their role in OA, some have been associated with other arthritis pathologies including rheumatoid arthritis [84, 85], lupus [86], Kashin-Beck [87], and ankylosing spondylitis [88].

Data from sPLS-DA’s component 2 offered important information related to differentially regulated proteins between OA-obese and OA-non-obese, which are potentially related to obesity. Obesity is a well-known and major risk factor for OA, but not all OA patients are obese. Thus, in the search for a specific OA biomarker, it is important to focus on molecules that are regulated in the general OA population, avoiding other pathological condition-related (obesity) proteins. Among the 23 proteins identified in component 2, none were found in component 1 (discriminating OA from controls), and thus are mostly related to conditions other than OA. Several of these proteins are involved in lipid metabolism: apolipoproteins A1, C1, and L1, paraoxonase 1 (PON1), which binds to HDL, HPR, which is known to associate with APOL1-containing HDL, and ADIPOQ, an adipokine involved in the control of fat metabolism and insulin sensitivity, which is also listed in the PCA analysis. Notably, a number of apolipoproteins and serpins identified as part of component 2 are among the highest contributors, and a number of those proteins have been studied as to their presence/levels in OA [19, 89,90,91,92,93,94]. Some others, such as SERPINC1, coagulation factor XII (involved in contact activation pathways), and protein C (PROC), are involved in the coagulation/fibrinolysis pathways, which are known to be activated in OA, as well as in obesity [95,96,97,98,99,100]. However, it cannot be ascertained that these pathways are specific of OA or rather of obesity. It is our opinion that the use of those proteins in the search for specific OA biomarkers should not be pursued. Nevertheless, in the OA-obese subgroup, some of these proteins including APOA1 and SERPINC1 would be worthwhile studying, as they may amplify/accelerate the OA process in these people and thus be used as therapeutic targets.

Although our study has identified potential biomarkers, it has limitations. First, the cohort used (proteomic, OAI; validation CODING and NFOAS) included individuals from the USA and Canada, respectively. A validation of our results from other countries would be required to determine whether those proteins could indeed be further studied as biomarkers. Second, gender discrimination could also be performed as it is well known that there are sex-specific differences in OA [101,102,103]. In this study, we could not perform such a discrimination as we had a relatively modest sample size, which was limited by the methodology used. A technique allowing a greater sample size should permit it. Third, despite a data filtering, some of the proteins (for example CRP, KDM4C, FBN1, and actin) across the whole dataset showed a high number of imputed noise values (Table S1), which might have created a bias in the reported fold changes. However, as some of the targeted proteins selected as new potential biomarkers for the entire OA population were further validated using samples from an external cohort, including FBN1, this reduces the risk to report wrong biomarkers. Moreover, the use of a larger cohort combined with other proteomic analysis strategies could confirm our findings.

Conclusion

In OA, current diagnoses are not sensitive enough to identify the disease in the early stages. To improve therapeutic approaches for the prevention or delay of the progression of this disease, the identification of specific molecules/biomarkers enabling early determination of this disease is needed. At present, there are no such validated specific serum biochemical markers. As a novel contribution, we identified, by using proteomics/mass spectrometry and targeted disease-specific proteins, four OA serum potential new biomarker candidates for the entire OA population: CRTAC1, FBN1, VDBP, and possibly SERPINF1.