Background

Porcine reproductive and respiratory syndrome (PRRS) is one of the most important infectious swine diseases throughout the world [13] and is still having, more than two decades after its emergence, major impacts on pig health and welfare (reviewed by [4]). The responsible agent is an enveloped, ca. 15 kb long positive-stranded RNA virus (PRRSV) that belongs to the Arteriviridae family [5] and that can cause late-term abortions in sows and respiratory symptoms and mortality in young or growing pigs. Once this virus has entered a herd it tends to remain present and active indefinitely causing severe economic losses and marketing problems due to high direct medication costs and considerable animal health costs needed to control secondary pathogens [6, 7].

Pigs of all ages are susceptible to this highly infectious virus, which has been shown to be present in most pigs for the first 105 days post infection [8]. However clinical manifestations vary with physiological status and age [9], as the virus uses several immune evasion ways to complicate the ability of the host to respond to the infection process [4, 10, 11]. Weaning piglets, in particular, are likely to be exposed to the infection. Although PRRSV viraemia is often asymptomatic in these piglets, their productive performance is significantly decreased. Indeed, despite being sero-negative, persistently infected piglets still harbor PRRSV and have been shown to be a source of virus for susceptible animals [12].

SELDI-TOF MS analysis allows the comparison of protein profiles obtained from a large number of diverse biological samples by combining two principles, chromatography by retention on chip surface on the basis of defined properties (e.g. charge, surface hydrophobicity, or biospecific interaction with ligands) and mass spectrometry. It is thus distinct from common non-selective techniques, such as two-dimensional polyacrilamide gel electrophoresis (2D-PAGE) and matrix-assisted laser desorption ionisation (MALDI) MS. SELDI-TOF MS has been widely used for diagnostic biomarker discovery and validation across studies in blood serum/plasma, particularly in cancer research (reviewed by [13]), but also to characterize and identify biomarkers associated with viral and other infectious diseases [1419]. The protein signatures identified by SELDI-TOF MS analysis have thus many potential applications in animal health, including early diagnosis of diseases, prediction of disease states, as well as monitoring of disease progression, recovery, and response to vaccination. Few reports have been published for livestock applications [1922].

Current needs in veterinary medicine and animal husbandry include the identification of tools that allow the early warning of diseases, especially during the incubation periods and before the onset of clinical signs. Therefore, the objective of this study was to identify by SELDI-TOF MS a proteomic profile able to differentiate PPRSV-positive from -negative weaning piglets raised in commercial farms and without clinical symptoms of the disease. We optimized the experimental conditions previously described [20] and validated 47 statistically significant discriminatory biomarkers. Among these, a combination of 14 biomarkers identified in F1 on CM10 at low focus mass permitted to correctly assign the piglets to the PPRSV-positive or PRRSV-negative groups with sensitivity and specificity of 77% and 73%, respectively.

Results

To enable identification of medium-low abundant proteins, only samples with a total content of hemoglobin lower than 4.52 μg/mL were included in the study. Total hemoglobin absorbance and the resulting hemoglobin content were calculated for all the piglet sera in both discovery (n = 50) and validation (n = 70) phases of the study [ Additional file 1: Table S1 and Additional file 2: Table S2, respectively].

Fractioning of the sera resulted in six different pH fractions; F1 = pH9, F2 = pH7, F3 = pH5, F4 = pH4, F5 = pH3, and F6 = organic solvent. The fractions F1, F4, and F6 were analyzed on the three surfaces CM10, IMAC30, and H50 at both low and high focus masses. Fractions F2 and F3 were excluded from further analyses because preliminary data with 3 serum samples showed that they still contained elevated quantities of abundant proteins (such as albumin), as well as the quality of the spectra and the number of signals detected were very low. Fraction F5 was excluded because no signals were detected.

The fractions F1, F4, and F6 on the surfaces CM10, IMAC30, and H50 showed generally good signal intensities and low coefficient of variation (CV) values (< 30%) in both the discovery and validation phases. Exceptions were fraction F1 on IMAC30 (analyzed at high focus mass) and H50 (both low and high focus masses), as well as fraction F4 on H50 (low focus mass), which were therefore excluded from further analyses.

Discovery phase

A total of 50 pig sera, 25 from PRRSV-positive and 25 from PRRSV-negative piglets were analyzed during the discovery phase of the study [ Additional file 1: Table S1].

We found a total of 785 protein peaks in the sera of all samples (Table 1). The most represented pH fraction was F6 (n = 381), followed by F4 (n = 223), and F1 (n = 181). On surface CM10 we identified 317 peaks, on IMAC30 302 peaks, and on H50 166 peaks. Furthermore, a much higher number of peaks (n = 512) was found on low mass range (1–20 kDa) compared to the high (n = 273; 20–200 kDa).

Table 1 Protein peaks identified by SELDI-TOF MS in the discovery phase of the study

Of the total 785 peaks, 200 were statistically significant (p < 0.05) and permitted to discriminate between PRRSV-positive and PRRSV-negative piglets. Discriminatory peaks were found in F1 (n = 80), F4 (n = 49), and F6 (n = 71) on the surfaces CM10 (n = 107), IMAC50 (n = 58), and H50 (n = 35), as well with low (n = 110) and high (n = 90) focus masses (Table 1).

The highest sensitivity (80%) and specificity (76%) were obtained with the 22 discriminatory peaks of F1 on CM10 at low focus mass. Higher sensitivities were found with the 18 peaks of F4 on CM10 at low focus mass (87%), the 7 peaks of F6 on CM10 at low focus mass (85%), and the 12 peaks of F6 on CM10 at high focus mass (87%), however the specificities of these peaks were lower (64%, 66%, and 66%, respectively).

Validation phase

The validation phase was performed on 35 new PRRSV-positive and 35 new PRRSV-negative piglets using the same experimental conditions applied in the discovery phase [ Additional file 2: Table S2]. Of the total 200 peaks that were significant in the discovery phase, 47 were confirmed in the validation phase (Table 2).

Table 2 Discriminatory protein peaks identified in the discovery phase and confirmed in the validation phase

In particular, 28 peaks were confirmed on CM10, 19 on IMAC30, whereas none of the peaks could be validated on the surface H50. In the 3 fractions with different pH tested, F1 contained 28 peaks, F4 3 peaks, and F6 16 peaks. A higher number of peaks (n = 36) corresponded to small peptides (acquired at low focus mass 1–20 kDa), compared to big peptides (n = 11) that were acquired at high focus mass (20–200 kDa).

The vast majority (42) of the peaks were up-regulated in PRRSV-positive piglets compared to the negative, while only 5 peaks (F1 on CM10: 5,468 and 5,536 Da; F6 on CM10: 14,843 Da; and F6 on IMAC30: 27,806 and 27,606 Da) were down-regulated (Table 2). In line with the results of the discovery phase, the combination of peaks with the highest sensitivities (77% and 64.5%) and specificities (73% and 69.7%) were found on CM10 at low focus mass with the 14 discriminatory peaks of F1 and the 6 discriminatory peaks of F6, respectively (Table 2). The correctly and incorrectly assigned piglets using these peaks are graphically illustrated in the heat map of Figure 1; part 1A shows the 14 peaks of F1 and part 1B the 6 peaks identified in F6.

Figure 1
figure 1

Heat map showing cluster analysis of the PRRSV-positive and PRRSV-negative piglets tested with the 2 combinations of discriminatory peaks that showed the highest sensitivity and specificity values. The x-axis of the heat maps shows the piglets analyzed in the validation phase (blue: PRRSV-positive; red: PRRSV-negative), while the y-axis displays the molecular weights in Dalton of the 14 significant discriminatory peaks identified in F1 (A) and the 6 peaks in F6 (B) both on the surface CM10 at low focus mass. The maps contain peak fold changes Z-score normalized over all piglets. They are color coded, with red corresponding to up-regulation and green to down-regulation in PRRSV-positive piglets. As expected, piglets from the two different groups clustered together, although some incorrectly assigned piglets could be observed (as confirmed by the calculated sensitivities and specificities values, see text).

Principal component analysis (PCA) was performed on the profiles of the 47 discriminatory peaks identified during the discovery and confirmed during the validation phase to identify and quantify independent sources of variation observed in the data. PCA analysis showed that 58.2% (PCA1), 17.9% (PCA2), and 12.9% (PCA3) of the total variability within the data was accounted for the X, Y, and Z axes, respectively. These axes were used to plot the data (Figure 2) and they provide an overview of the variation between the individual samples and show how samples grouped. Figure 2A showed three-dimensionally that the PCA peak profiles of piglets positive to PRRSV differed from piglets negative to PRRSV and revealed a good separation among the profiles of the two different groups, especially considering the high heterogeneity of the samples included in the study, as reported in the MM section and in [ Additional file 1: Table S1 and Additional file 2: Table S2]. Furthermore, with the exception of few outliers, PCA1 combined with PCA2 also separated well the two piglet populations (Figure 2B).

Figure 2
figure 2

Principal component analysis (PCA) showing the effects of the 47 significant discriminatory peaks on piglets positive or negative to PRRSV infection. The figure shows a projection of the measured peak intensities profiles onto the plane spanned by the three principal components (PCAs) that are the axes along which the data vary the most, for the 35 PRRSV-positive (blue) and the 35 PRRSV-negative (red) piglets of the validation study. PCA1, PCA2, and PCA3 accounted for 58.2%, 17.9%, and 12.9% of the variability in the data, respectively. PCA analysis illustrates a 3-dimentional plot comparison of PCA1, PCA2 and PCA3 in the three axes (A), as well as 2-dimentional score plot comparisons between PCA1 and PCA2 (B).

Comparison with relevant protein peaks and immunity genes related to PRRSV infection in other studies

To provide an overview of the current literature and to try to correlate the discriminatory peaks identified in this study with relevant proteins, we summarized in Table 3 the molecular weights of several peaks that have been shown to be related to PRRSV infection.

Table 3 Comparison between relevant PPRSV-related and pig proteins identified in other studies and the discriminatory peaks found in this study

First of all, we summarized the available information on the PRRS viral proteins. The PRRSV genome is ca. 15 kb in size and consists of the 5' untranslated region (UTR), at least nine open reading frames (ORFs), and the 3' UTR followed by a polyadenylation tail. The expected and experimentally identified MWs for each viral protein from different studies are reported in Table 3, along with the MW of the closest discriminatory peak identified in the current study.

Interestingly, the MW of the viral proteins ORF2b, ORF4, and ORF7 were very similar (difference of MW ≤0.3 kDa) to up-regulated discriminatory peaks identified here (Table 3).

As next, we compared proteins related to PRRSV infection that were identified in additional studies (Table 3); interestingly, all the 9 peaks found by [28], and in particular the only up-regulated in PRRSV infected (corresponding to the Alpha 1 S (a1S)-subunit of porcine Haptoglobin), showed minimal MW differences (≤0.3 kDa) with up-regulated peaks identified in this study (Table 3).

Additional discriminatory peaks found in the current study were very similar (MW differences ≤0.3 kDa) to those identified in other PRRS-related proteomic studies (Table 3). They corresponded to the following proteins: Glyceraldehyde-3-phosphate dehydrogenase, Proteasome activator hPA28 subunit beta, S100 calcium binding protein A10, Galectin 1, and Gastric-associated differentially expressed protein YA61P [26]; Heat shock 27 kDa protein 1, Superoxide dismutase 2, Myoglobin, and Vacuolar protein sorting 29 [29]; Heat shock protein 27 kDa and Nucleoside diphosphate kinase A [30]; Heat shock 27 kDa protein 1, Galectin 1, and Ubiquitin [31].

Discussion

In the present work, we show that proteomic fingerprint profiling is useful in researches on PRRS immuno-pathogenesis and might also be a robust, large scale diagnostic tool for the assessment of the proportion of PRRSV-positive weaning piglets without clinical symptoms in a herd. Indeed, we confirmed that the high-throughput capacity of the SELDI-TOF MS technology allows the screening for disease biomarkers of hundred of samples in a relative short-time period and with minimal sample preparation (as previously also reported by [32]).

Our results indicate that from the 200 significant peaks found in the discovery phase, a total of 47 could be confirmed in the validation phase. These values are comparable with another study where similar experimental conditions were applied to ovine sera [19].

Our findings also show that the combination of 14 discriminatory peaks in F1 on CM10 at low focus mass provided the highest sensitivity of 77% and specificity of 73% to correctly assign the piglets to the PPRSV-positive or PRRSV-negative groups. These percentages are in line with recent studies in humans using the same technology [33, 34]. Also the PCA results showed a good separation of the piglets in the two groups under examination. This was reached even though the tested piglets had large variability and heterogeneity, as they were collected from several farms located in different regions, and underwent high environmental pressures, typical of the field conditions. This is mainly due to the careful choice of the serum samples, where we tried to minimize the environmental differences by using same experimental parameters (e.g. sample collection procedures, storage, handling) and by including a similar number of pigs from the same breed (Large White) and with very similar sex ratios and ages (at weaning).

In a preliminary work [20] we had successfully transferred the experimental conditions used in profiling experiments of human sera to pig sera. However, in that work, none of the potential biomarkers identified in the discovery phase could be validated in the subsequent validation phase, because of high samples heterogeneity and high content of serum (e.g. albumin) and contaminant proteins (e.g. hemoglobin), having a negative effects on the detection of significant biomarkers, particularly those corresponding to the medium-low abundant proteins. It has been reported that low abundant proteins constitute about 1% of the entire human serum proteome, with the remaining 99% being comprised of only 22 proteins [35]. As it was therefore necessary to reduce the level of abundant proteins, in this follow up study, particular relevance was given to the content of the contaminant protein hemoglobin. Only non-hemolytic samples with similar, low contents of hemoglobin were included in the study. Additionally, to further increase the likelihood to identify statistically significant discriminatory biomarkers, we introduced a fractioning step based on anion-exchange chromatography. In a similar study performed with MALDI-TOF [28], where serum samples were analyzed in the first weeks (2–16) of PRRSV infection (also verified by PCR), a significantly lower number of peaks were identified compared to the present work. While protein peaks with M/Z values of 4.165, 4.460, 5.560, 8.330, 8.825, 12.250/12.550, and 14.010 kDa were found in 94 serum samples from 59 pigs, only one peak (9.244 kDa), corresponding to the alpha 1 S (a1S)-subunit of porcine Haptoglobin (Hp), was differentially up-regulated in PRRSV infected pigs. Interestingly, all these peaks were very similar (MW difference ≤0.3 kDa) with discriminatory peaks identified here (details in Table 3). Furthermore, two peaks identified in this study (23.162 and 14.843 kDa) were similar to peaks identified elsewhere (corresponding to Heat shock 27 kDa protein 1 [2931] and Galectin 1 [26, 31], respectively). In accordance with [31], the identified peak corresponding to Heat shock 27 kDa protein 1 was up-regulated, while the peak corresponding to Galectin 1 was down-regulated. Thus, these proteins seem to be very interesting and suitable candidates for future investigations.

The preponderance of the significant biomarkers had a molecular mass lower than 20 kDa, confirming that small peptides are a rich source of relevant biomarkers in SELDI-TOF MS analyses as previously reported in human [36] and ovine [19] sera. This may also partly be caused by the fact that the low molecular weight region (LMW) of the serum proteome, called peptidome, is an assortment of small intact proteins and proteolytic fragments of larger proteins, including several classes of physiologically important proteins like peptide hormones and components of both the innate and adaptive immune systems (i.e. cytokines and chemokines) [35, 37]. This is particularly interesting as the patho-physiological state of the body’s tissue is predominantly reflected in the LMW and low abundance region of the serum proteome, and specific protein fragments of the serum peptidome have been shown to contain a rich source of disease-specific diagnostic information and they have been correlated with disease stages in several studies (reviewed by [37]).

In agreement with other studies [29, 31], we found that the majority of the discriminatory biomarkers were up-regulated in PRRSV-positive piglets. This seems to suggest that the corresponding proteins might be of viral origin or related to the innate or adaptive immune responses (e.g. cytokines, chemokines, acute phase proteins, toll like receptors). In fact, several peaks showed high similarities (MW differences ≤0.3 kDa) with previous works, in particular regarding viral proteins (Table 3). The assignment of the discriminatory peak to a specific protein will require additional work, because the SELDI-TOF technology can only detect masses/peaks of proteins that are differentially expressed between samples but can not directly identify the proteins. This represents one of the major drawbacks of this technology compared to other methods. However, an advantage of the SELDI-TOF MS in this regard is that the results of this technique might lead to the identification of new proteins that were previously not correlated to the disease, and this might hopefully lead to the identification of new biomarkers representing the field situation. The interpretation of these results and the continuation of this project will benefit from the very imminent termination and publication of the sequence of the swine genome [38], which will definitely contribute to a more precise annotation and a better identification of genes and proteins and thus will greatly facilitate genome wide mapping association studies.

Conclusions

Although a combination of peaks identified with different experimental conditions (e.g. using different fractions and different surfaces) might have provided higher discriminatory power, here we developed a PRRSV diagnostic test based on peaks identified with the same experimental conditions (e.g. fraction, surface, and focus mass), which can be reproduced at high-throughput at reasonable costs. These results provide a set of proteomic biomarkers and related, optimized experimental conditions for high-throughput profiling of pig populations by SELDI-TOF MS for whole genome association studies, where identification of proteins underlying the phenotype can be made a posteriori. SELDI-TOF MS might therefore represent a complementary test or a possible alternative to classical (PCR) and more recent diagnostic methods (e.g. antibody detection in saliva) for profiling large flocks of pigs at reasonable costs, using blood samples that are routinely collected for general veterinary inspections. As well, these SELDI-TOF MS based tests could complement and provide a broader reference for emerging diagnostic methods and have potential applications for the detection of relevant proteins having highly heritable traits (e.g. acute phase proteins).

Methods

Piglets

A total of 120 serum samples of Large White piglets were selected from a well defined and characterized repository database, presently containing more than 20,000 swine samples from 18 different farms of the Lombardy region, Italy. Selection of the piglets aimed to minimize environmental factors and experimental conditions that might influence the results [39]. Hence, all piglets were from the same breed (Large White), had similar ages (weaning: 45–50 days), and their sera showed a low and comparable amount of hemoglobin (calculated as shown below).

In the discovery phase of the study, a total of 50 pig sera, 25 from PRRSV-positive and 25 from PRRSV-negative piglets, as determined by PCR (see below), were analyzed [ Additional file 1: Table S1]. The validation phase was performed with the same experimental conditions as the discovery phase. A total of 35 new PRRSV-positive and 35 new PRRSV-negative piglets were examined [ Additional file 2: Table S2]. The actual duration of infection for each individual PRRSV-positive piglet was unknown, as sera were collected and analyzed once for each piglet (at weaning: 45–50 days of age). None of the piglets was treated, as they did not show any symptom of the disease.

To ensure large variability and heterogeneity of the samples and minimize environmental differences, we included in the PRRSV-positive and -negative groups similar numbers of piglets with the same sex that originated from several farms located in different regions. In fact, PRRSV-positive piglets originated from 6 farms of the Lodi region (n = 8) and 7 farms of the Mantua region (n = 52), while PRRSV-negative piglets were collected in 5 farms around Lodi (n = 19) and 9 farms around Mantua (n = 41). Sex ratios males/females (44/76) were very similar in PRRSV-positive (21 vs. 39) and -negative (23 vs. 37) piglets, respectively.

Veterinary inspections of the overall clinical status of the piglets at the day of serum collection did not evidence any clinical symptoms of PRRS, including respiratory distress or sneezing.

Serum samples

All the serum samples were collected, stored, and handled in the same way. They were obtained for each piglet by storing two mL of whole blood without anticoagulants at room temperature (RT) for 4 h followed by centrifugation at 3,500 rpm for 4 min. As suggested in a previous work [20], an abundant quantity of hemoglobin in the serum can hide early diagnostic biomarkers of PRRSV by competing with the other serum components for the binding site of the chromatographic surfaces. To avoid the consequent signal suppression of the medium-low abundant proteins, only non-hemolytic samples were included in the present study.

A total of 200 clear, transparent sera without red pigmentation (low hemoglobin content) were first selected by visual screening from the total sera available in the database. Hemoglobin content of each serum sample was then determined according to [40] with minor modifications. A calibration curve was generated using five standard solutions (concentrations: 1.8, 3.6, 5.4, 7.2, and 9 μg/ml) of porcine hemoglobin diluted in 400 μL commercially available porcine serum (Sigma Aldrich, St Louis, MO, USA). Triplicate samples were incubated for 5·min at RT, then absorbance (E) was measured at 380, 415, and 440 nm. Absorbance at 380 and 440 nm was used to discern background absorbance flanking the absorbance peak (415·nm) of oxygenated hemoglobin. Absorbance due to hemoglobin was calculated as: E415–[(E380 + E440)/2]. Hemoglobin absorbance values of the samples were converted to μg/mL of hemoglobin by means of the calibration curve. Of the 200 initial samples, a total of 120 samples having an absorbance ≤ 0.085 (corresponding to a hemoglobin content below 4.52 μg/mL) were included in the study; 50 in the discovery and 70 in the validation phases, respectively.

Viral RNA extraction from the sera was performed following standard Roche procedures (High Pure Viral RNA Kit, Roche Diagnostics GmbH, Germany). Presence or absence of PRRSV was determined by multiplex PCR of conserved regions of viral ORF7 using primers and conditions previously described [41, 42]. The test also enabled to discriminate European and American genotypes and could detect all the different viral strains present in the Lombardy region at the time of sample collection.

Serum fractionation

All the detailed steps of the SELDI-TOF MS process performed here are schematically represented [see Additional file 3: Figure S1]. The protocol follows the manufacturer’s instructions with minor modifications (Bio-Rad Laboratories, ProteinChip® Serum Fractionation Kit manual).

Briefly, serum samples were pre-fractionated with U9 buffer (9 M urea, 2% 3-[(3-Cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), 50 mM Tris–HCl, pH = 9) to favor dissociation of protein complexes [ Additional file 3: Figure S1A].

Sera were fractionated using a ProteinChip Q strong anion-exchange resin filtration plate (Bio-Rad Laboratories, Hercules, CA). The filtration plate was re-hydrated and equilibrated with rehydration buffer (50 mM Tris–HCl, pH = 9) and the resin washed with rehydration buffer and U1 solution (1 M urea, 0.2% CHAPS, 50 mM Tris–HCl, pH = 9) [ Additional file 3: Figure S1B]. Serum samples were then mixed with U1 solution and added to the equilibrated filtration plate. Successive elutions with different buffers with decreasing pH and a final organic solvent (= different fractions) were collected by centrifugation. The buffers used included pH = 9 (50 mM Tris–HCl, 0.1% n-octyl β-D-glucopyranoside (OGP)), pH = 7 (50 mM 4-(2-Hydroxyethyl)piperazine-1-ethanesulfonic acid (HEPES), 0.1% OGP), pH = 5 (100 mM Na acetate, 0.1% OGP), pH = 4 (100 mM Na acetate, 0.1% OGP), pH = 3 (100 mM Na citrate, 0.1% OGP), and organic solvent (33.3% isopropanol, 16.7% acetonitrile, 0.1% trifluoroacetic acid) [ Additional file 3: Figure S1C].

ProteinChip arrays

The six pH fractions obtained (F1 = pH9, F2 = pH7, F3 = pH5, F4 = pH4, F5 = pH3, and F6 = organic solvent) were profiled on weak cation-exchange (CM10), immobilized metal affinity capture-copper (IMAC30-CU), and reverse-phase (H50) ProteinChip® arrays. The arrays were initially placed in a Bioprocessor (C50-30011, Bio-Rad Laboratories) and then treated according to their surface [ Additional file 3: Figure S1D]. Each sample fraction was then bound/spotted randomly to the different ProteinChip® arrays using array-specific binding buffers [ Additional file 3: Figure S1E]. A 50% saturated sinapinic acid (SPA) matrix solution was finally added to each spot on the ProteinChip array prior to the final analysis [ Additional file 3: Figure S1F].

SELDI-TOF MS analysis

ProteinChip arrays were read using a Ciphergen Protein-Chip Reader PCS4000 model and data were analyzed with Ciphergen Express Software (Ciphergen Biosystems).

Profiles were collected in the range 1–200 kDa at the two different ion focus mass 10 kDa (“low focus mass”) and 50 kDa (“high focus mass”). The instrument was calibrated for dataset collection using all-in-one peptide standard (Bio-Rad Laboratories) in the 1–20 kDa range for 10 kDa low ion focus mass and all-in-one protein standard in the 20–200 kDa range for 50 kDa high ion focus mass [ Additional file 3: Figure S1G].

Ciphergen Express software analysis

Spectra were normalized by total ion current, starting and ending at the M/Z of the collection ranges (1–20 or 20–200 kDa) after baseline subtraction and noise calculation. Outlier spectra were removed. The spectra were aligned to a reference spectrum with the normalization factor nearest 1.0. The spectra were aligned only if the percentage coefficient of variation was reduced after the alignment. Peaks from the different spectra were aligned using the cluster wizard function of the Ciphergen Express 3.0.6 software. The peak detection was automated within the M/Z range of analysis. Peaks were detected on the first pass when the signal-to-noise (S/N) ratio was 7 and the peak was 5 times the valley depth. Peaks below threshold were deleted and all first-pass peaks were preserved. Clusters were created within 0.15% of M/Z for each peak detected in the first pass. The clusters were completed by adding peaks with S/N ratio of 2 and two times the valley depth. P-values and ROC/AUC (Receiver Operating Characteristic/ Area Under Curve) values were calculated by using the P-value wizard.

A 2-tailed t-test was used for statistical analysis of differences in peak intensity between groups. P-values below 0.05 were considered statistically significant. Principal component analysis (PCA) and agglomerative hierarchical clustering algorithm were applied to investigate the pattern among the different statistically significant peaks.

PCA is a multivariate data analysis that transforms without a loss of essential information a number of correlated variables into a smaller number of uncorrelated variables called principal components (PCs), which can explain sufficiently the data structure. PCA transformation allows studying many variables simultaneously, showing how similar samples are correlated and grouped together. The data structure is visualized directly in a graphical way by projection of objects onto the space defined by the selected PCAs (for details see [43]).

Finally, to evaluate the influence of external variables (e.g. sample processing and acquisition) on the system under study and to calculate the dispersion of the acquired data, the coefficient of variation (CV), which is the normalized measure of dispersion of a probability distribution and shows the% dispersion of the data in rapport to the media (intensity variation), was also calculated. Six serum samples commercially available were prepared and analyzed in parallel with the pig samples of both, discovery and validation phases. The CV was calculated for all fractions and surfaces by choosing 6 peaks evenly distributed along the entire range.