Background

Male infertility is a major problem for mammalian reproduction. The nature of sub-fertility due to the male is as complex as that of the female [1]. Infertility due to male factor contributes approximately 40% of the infertility cases in humans. For this reason it is very important to investigate the factors that affect male fertility. Here we used bovine spermatozoa to model human male fertility because cattle provide several advantages as a model for male factor infertility. These include good breeding records fertility data records and progeny records. In cattle breeding, Artificial insemination (AI), a common breeding technique, utilizes semen from genetically superior sires to inseminate cows. In the United States more than ~70% of cows are bred by AI but only ~50% of these matings result in successful full term pregnancy [2]. The underlying molecular events/mechanisms that determine the fertilizing potential of a semen sample are not well defined. A thorough understanding of these mechanisms is essential for obtaining consistently high reproductive efficiency and to ensure lower cost and time-loss by breeder.

Fertility traits of semen can be categorized as compensable or uncompensable [1, 37]. Defects in compensable traits (motility and morphology) can be overcome by increasing the number of spermatozoa per insemination [1]. Defects in uncompensable traits affect the function of spermatozoa during the later stages of fertilization and in embryonic development [1, 8] and as such cannot be compensated. Uncompensable traits include nuclear vacuoles [9], morphological deficiencies that do not suppress movement [4], defective chromatin structure [10]. Low fertility in bulls has an uncompensable component that includes reduced cleavage rate and delayed pronuclear formation following in vitro fertilization [1, 11]. Currently available fertility assays assess the defects that affect functional competence of spermatozoa (i.e. capacitation, acrosome reaction, sperm-oocyte interaction) [8, 12], however these cannot definitively predict fertility. At present, the molecular nature of sperm fertility defects or biomarkers for accurate fertility prediction is not known [13].

Spermatozoa are transcriptionally inactive so the only comprehensive method to understand the molecular functions in spermatozoa is via proteomics [13]. Published proteomic studies with bull spermatozoa described the sub-proteome of the sperm and functions of proteins from its surrounding cells. Accessory gland (AG) proteins were shown to modulate important sperm functions after ejaculation and in the female reproductive tract such as capacitation, acrosome reaction, sperm-oocyte interaction, and sperm protection [14]. It is known that fertile associated antigen (FAA), a heparin binding protein from seminal vesicles and prostate glands, binds to spermatozoa membrane and modulates heparin-sperm interactions that are indicative of fertility [15]. Two seminal plasma proteins such as, prostaglandin-D-synthetase and osteoponin were more abundant in the semen of high fertility bulls when compared to low fertility bulls [16, 17].

Here we describe a comprehensive proteomic analysis of bull sperm using differential detergent fractionation (DDF) two-dimensional liquid chromatography followed by electrospray ionization tandem mass spectrometry (DDF 2-LC ESI MS2; [18]). We compared protein expression profiles of sperm from high and low fertility bulls to characterize the differences in fertility at the protein level. Our results show that expression of 2051 and 2281 proteins was specific to high and low fertility bull spermatozoa, respectively and 1518 proteins were common to both. Differential expression of 125 proteins was significant between high and low fertility bull spermatozoa and these proteins are potential biomarkers for bovine male fertility. Biological systems utilize highly complex, interrelated metabolic and signaling pathways to function. Therefore, to identify signaling pathways involved in fertility, we carried out systems modeling of our proteomic datasets using Gene Ontology (GO) and Ingenuity Pathway Analysis (IPA). We identified differences in the signaling pathways between high and low fertility bull spermatozoa and found that EGF and PDGF signaling pathways were specific to high fertility.

Results

Proteome profiles of spermatozoa from high and low fertility bulls

We identified 3569 and 3799 proteins in high and low fertility group spermatozoa respectively (see additional file 1). Among these 1518 (20.4%) were common to both groups and 2051 and 2281 proteins were unique to high and low fertility groups respectively (Figure 1). Only those proteins identified by at least three peptides were included in the analysis for differential expression and we identified 125 proteins as differentially-expressed between the high and low fertility spermatozoa. Compared to low fertility bull spermatozoa, expression of 74 proteins increased and there was a decrease in the expression of 51 proteins in high fertility spermatozoa (Table 1). Only a small proportion of proteins identified in this study have been previously described (15.1% of the high fertility group specific and 14.3% of the low fertility group specific proteins (Figure 1)). The majority of the identified proteins are 'predicted' (i.e. predicted based on sequence similarity to known proteins in other species and are frequently found in NRPD database for species that have had their genomes sequenced [19]). We contributed to the annotation of the newly sequenced bovine genome by experimentally confirming the in vivo expression of 4,313 electronically predicted proteins (see additional file 1). We also identified 10.6% and 9.8% 'hypothetical' (i.e. proteins predicted from nucleic acid sequences and that have not been shown to exist by experimental protein chemical evidence [20]) proteins specific to high fertility and low fertility spermatozoa respectively.

Figure 1
figure 1

Comparison of proteins identified in high fertility and low fertility spermatozoa. Distribution of predicted, known and hypothetical proteins is shown. a known proteins, b predicted proteins, c hypothetical proteins.

Table 1 Differentially expressed proteins.

Predicted and hypothetical proteins do not have any functional annotation associated with them and they represent ~80% of differentially expressed proteins between high and low fertility spermatozoa (Table 1). This poses a problem for meaningful biological modeling of our data without carrying out some functional annotation first. Therefore, we annotated all differentially expressed proteins in our data sets using AgBase GO resources.

Membrane and nuclear proteins

Membrane and nuclear proteins are fundamental for inter and intra cellular signaling and are thus fundamental for modeling cell-cell interactions. Sperm oocyte fusion is a key element for fertilization. This process is facilitated by sperm surface proteins and leads to specific binding of the sperm surface-active component with the egg zona pellucida and, ultimately, sperm-egg fusion [21]. To identify proteins from the sperm membrane and the nucleus which function in cell fusion, we focused on membrane and nuclear proteins identified in our datasets. Based on the GO associations of known proteins, 40.6% (395) are membrane proteins. We also identified 112 nuclear proteins based on GO associations. Biological process annotation of membrane proteins revealed that majority of membrane proteins involved in transport (33%), cell communication (18%) and metabolism (17%).

We GO annotated all differentially expressed proteins and applied the generic GO Slim [22] to identify 7 functional super-categories represented in differentially expressed proteins in high fertility spermatozoa. Most GO Slim categories, including processes such as metabolism, cell communication and cell motility showed overall up regulation of protein expression in the high fertility group while transport proteins showed an overall down regulation in the high fertility group (Figure 2).

Figure 2
figure 2

Overall effects in GO Slims of differentially expressed proteins of high and low fertility spermatozoa. Biological process GO annotations of all significantly altered proteins between high and low fertility spermatozoa were used to generate GO Slims. For each GO Slim, the difference in the numbers of proteins with increased expression and the number of proteins with decreased expression (relative to low fertility spermatozoa) was calculated to estimate the net regulatory effect.

High fertility and low fertility sperm proteomes: molecular network and pathway analysis

Protein identification from biological samples on a global scale is important. However, there is a need to move beyond this level of analysis; Instead of simply enumerating a list of proteins, the analysis needs to include their interactions as parts of complexes, pathways and biological networks. To achieve this level of analysis with our high fertility and low fertility spermatozoa proteomic datasets we used Ingenuity Pathway Analysis (IPA). At IPA thresholds for significance, 71, and 73 networks and 68, and 73 functions/diseases were significantly represented in the proteomes of high fertility and low fertility spermatozoa respectively. The top 10 functions/diseases (ranked based on significance), and the associated signaling pathways are shown in Table 2 and Table 3 for proteomes of high and low fertility groups respectively. Analysis of the top 10 functions revealed that functions like cellular movement, cell to cell signaling and interaction were identified only in the high fertility sperm proteome (Table 2). Whereas, functions like cell death and reproductive system disease were identified only in the low fertility sperm proteome (Table 3).

Table 2 Top ten functions/diseases and their respective top ten signaling pathways in high fertility group spermatozoa.
Table 3 Top ten functions/diseases and their respective top ten signaling pathways in low fertility group spermatozoa.

Compared to low fertility sperm proteome (9), the high fertility sperm proteome (20) had a 2-fold enrichment in signaling pathways. However, the number of significant metabolic pathways represented was comparable between the low (8) and high (9) fertility spermatozoa. Epidermal growth factor (EGF) signaling was the most prominent signaling pathway specific to high fertility sperm (Figure 3). EGF signaling is known to promote proliferation, survival, and differentiation of a wide variety of mammalian cells [23]. In addition to the EGF signaling pathway, platelet derived growth factor (PDGF) signaling, peroxisome proliferated activator receptor (PPAR) signaling, interleukin(IL) -4 signaling, NF-kβ signaling, chemokine signaling, and insulin growth factor (IGF)-1 signaling were identified only in high fertility spermatozoa. In low the fertility group, Cell cycle: G2/M DNA damage check point regulation was the most significant pathway followed by integrin signaling.

Figure 3
figure 3

EGF signaling pathway generated by the Ingenuity Pathway Analysis (IPA) software. EGF and PDGF signaling pathways were the top two pathways in the top 10 functions/diseases associated with the high fertility spermatozoa (Table 2). Each node represents a protein; proteins in shaded nodes were found in the high fertility spermatozoa dataset (see additional file 1) while proteins in clear nodes were not found in the high fertility spermatozoa dataset.

Proteins with significantly altered expression: molecular network and pathway analysis

Systems analysis of global proteomes revealed that some signaling pathways are differentially represented between the high and low fertility group spermatozoa. To further analyze these differentially expressed pathways, we carried out IPA analysis with just the 125 differentially expressed proteins. In high fertility spermatozoa, expression of 74 proteins was increased when compared to low fertility spermatozoa. IPA analysis identified three significant networks with scores of 22, 19, and 13 respectively. Proteins identified in the top three networks are participants in EGF signaling, PDGF signaling, oxidative phosophorylation, and pyruvate metabolism pathways. Expression of two proteins, ATP synthase, H+ transporting, mitochondrial F1 complex (ATP5B), and cytochrome c oxidase subunit III (COX3) involved in oxidative phosphorylation and casein kinase II involved in EGF signaling and PDGF signaling were higher in the high fertility spermatozoa compared to low fertility spermatozoa (Table 1). IPA also identified pyruvate metabolism as the most significant pathway in up regulated proteins of high fertility spermatozoa. In the low fertility sperm proteome, expression of 51 proteins increased when compared to high fertility spermatozoa. IPA analysis identified two significant networks in highly expressed proteins of low fertility sperm. Proteins identified in the top two significant networks are participants in integrin signaling and estrogen receptor signaling.

Discussion

Male fertility can be described as the success by spermatozoa to fertilize oocytes and of the resulting zygotes continue on through embryonic and fetal development until birth [11]. In this study we used bovine spermatozoa to study fertility as it can serve as a model for understanding human male infertility and reproductive diseases. Studying Bovine male fertility on its own merit has implications in agro-economics involving cattle industry worldwide.

A spermatozoon must reach the site of fertilization and be capacitated for successful fertilization to occur. A subsequent step is the acrosome reaction characterized by fusion of a spermatozoon outer acrosomal membrane with overlying plasma membrane [8]. The molecular mechanisms and signal transduction pathways mediating the processes of capacitation and acrosome reaction have been partially defined [8]. Bull sperm cytosolic fraction proteomic analysis showed enrichment for tyrosine kinases which are essential for phosphorylation of specific sperm proteins during capacitation [24]. The abundance of a variety of proteins from cells surrounding the sperm has been proposed to indicate male fertility [2, 14, 15]. Most of the studies used 2-dimensional electrophoresis (2-DE) for isolation and identification of sperm proteins [13, 2528]. To our knowledge this is the first comprehensive non-electrophoretic proteomic study of bull sperm proteome. The aim of our study was to identify proteins that were differentially expressed between high and low fertility bull spermatozoa and interrelated metabolic and signaling pathways that have a role in fertility.

We identified 125 proteins as differentially expressed in between the high and low fertility sperm even though 1518 proteins were common to both groups and about 2000 were unique to each. The reasons for this apparent discrepancy are that we took a conservative approach to the statistical analysis: only proteins identified by at least three peptides were included in the analysis for differential expression and the statistical method used in ProtQuant is very conservative. ProtQuant specifically address the issue of "missing" mass spectra that occurs in all 2-D LC MS2 -based expression proteomics methods. No other published method (either non-isotopic or isotopic) addresses this issue. Missing mass spectra are due to the inherent limitations of the mass spectrometers, the probabilistic nature of sampling and the cutoffs used to determine "true" assignments of peptides to mass spectra [29]. ProtQuant is highly conservative method which is based on sum of Xcorr method itself increases the specificity of spectral counting and reduce the type I errors of differential expression. Regardless, proteins were analyzed from each of three of the areas represented in Figure 1 and differentially-expressed proteins occurred in all three (i.e. proteins unique to the high and low fertility sperm as well as those common to both).

From proteome profiles of specific cells or tissues, one acquires large datasets that are inherently complex. As a result we consider it beneficial to model our bovine sperm proteome data sets using GO and IPA. From GO associations of differentially expressed proteins we found that there was a comparative up regulation of three biological processes in high fertility spermatozoa: metabolism, cell communication and cell motility (Fig 2).

Up regulation of metabolism is consistent with the fact that capacitation is coupled to a specific type of metabolism, that is glycolysis or oxidative respiration [30]. Pyruvate metabolism and glycolysis were the top most significant metabolic pathways represented in high fertility sperm proteome by IPA. In glycolysis, expression of pyruvate kinase (PKM2) was higher in high fertility spermatozoa. PKM2 catalyzes the production of pyruvate and ATP from phosphoenol pyruvate. Pyruvate formed in this process serves as an energy source for cells [31]. Impaired or lower pyruvate metabolism could limit the cell's ability to produce energy and this could be one of the reasons for reduced fertility in the low fertility group.

Expression of COX 3 and ATP5B involved in oxidative respiration was higher in high fertility spermatozoa compared to low fertility spermatozoa. COX3 is a member of the large transmembrane protein complex found in the mitochondrion and is the last protein in the electron transport chain. Coupling of electron transport to oxidative respiration maintains the high mitochondrial transmembrane potential required for mitochondrial ATP production [32]. ATP5B catalyzes the production of ATP from ADP in the presence of a proton gradient across the mitochondrial membrane and this ATP is utilized for the motility of sperm and capacitation [33].

Communication between sperm and oocyte is critical for successful fertilization. We found that there was up regulation of cell communication in the high fertility sperm proteome when compared to low fertility sperm proteome (Figure 2). To bring about cell to cell communication several signaling pathways are necessary. EGF signaling and PDGF signaling were the top two significant signaling pathways identified in high fertiliy spermatozoa. EGF and PDGF signaling pathways stimulate tyrosine phosphorylation of various MAP kinases and their upstream activators MEK1, MEK2 and MEKK [34, 35]. EGF signaling has an important role in sperm capacitation as it stimulates tyrosine phosphorylation of many proteins [36]. In addition, EGF signaling also activates phospholipase C (PLC) [36] (Figure 3). PLC is important for the acrosome reaction (AR), fertilization and embryo development. PLC catalyzes the production of inositol 1, 4, 5-triphosphate (IP3) from phosphatidylinositol 4, 5-biphosphate. IP3 generated by PLC activates the extra cellular calcium influx required for the AR via binding to the IP3 receptor (IP3R) gated calcium channel located on the acrosome membrane [37]. Mutations in mouse PLCB1 reduced the AR rate, fertilization rate and embryo development [38]. EGF signaling was specific to high fertility bull sperm. Defects in EGF signaling in low fertility spermatozoa may prevent capacitation.

Expression of casein kinase 2 (CKII) prime poly peptide in EGF signaling was higher in high fertility spermatozoa compared to low fertility spermatozoa (Table 1). CKII is preferentially expressed in late stages of spermatogeneis and is involved in sperm chromatin decondensation after sperm oocyte fusion [39, 40]. CKII deficient mice are infertile with oligospermia and globozoospermia[40]. EGF signaling also induces actin polymerization in bovine sperm capcitation [41]. Actin polymerization is essential for incorporation of sperm into egg cytoplasm [42] and for sperm nuclei decondensation [43].

Comparing the proteome profiles of bull sperm of high and low fertility showed some molecular features associated with low fertility. Cell cycle: G2/M DNA damage check point regulation was the topmost significant signaling pathway followed by integrin signaling in low fertility bull sperm (Table 3). The G2/M DNA damage checkpoint could help in maintaining the integrity of the genome during different stages of development. Progression through different phases of the cell cycle requires the sequential activation of various cyclin dependent kinases and these kinases in turn are regulated by integrin signaling. Integrin signals are necessary for cells to traverse the cell division cycle [44]. These two pathways may be a compensatory response for reproductive system disease function which was identified only in low fertility sperm (Table 3).

In addition to differences in signaling and metabolic pathways between high and low fertility spermatozoa, we identified differences in protein expression that had implications in sperm motility. Expression of A-kinase anchor protein-4 (AKAP4) was significantly higher in high fertility spermatozoa (Table 1). AKAP4 is a major fibrous sheath protein of the principal piece of the sperm flagellum. AKAP4 recruits Protein kinase A to the fibrous sheath and facilitates local phosphorylation to regulate flagellar function in humans [45]. It also serves as a scaffolding protein for signaling proteins and proteins involved in metabolism. Higher expression of AKAP4 in the high fertility group sperm could result in higher motility.

Conclusion

In summary, this is the first comprehensive description of the spermatozoa proteome of bovine. Comparative proteomic analysis of high fertility and low fertility bulls, in the context of protein interaction networks identified putative molecular markers associated with high fertility phenotype. We observed marked differences in signaling and metabolic pathways between high fertility and low fertility spermatozoa that have implications in sperm capacitation, acrosomal reaction and sperm-oocyte communication.

Methods

Selection of high and low fertility bulls

Frozen semen samples and bull fertility data (see additional file 2) from six mature and progeny tested Holstein bulls with satisfactory semen quality were provided by Alta Genetics (Watertown, WI).

Sample and Data Sources

The fertility data were established by a progeny testing program named Alta Advantage®, which is the industry's most reliable source of fertility information. It consisted of insemination records collected from 180 well managed partner dairy farms located in different geographical regions across the United States. This breeding program provided the advantages of DNA verification of the paternity of the offspring, and diagnosed pregnancies by veterinary palpation, instead of just relying on non-return rates 60–90 days after breeding.

Bull Fertility Prediction

To predict fertility of the bulls from the given source, a sub-set of data were generated consisting of 962,135 insemination records from 934 bulls with an average of 1,030 breedings ranging from 300 to 15,194. The environmental and herd management factors that influence fertility performance of sires were adjusted using threshold models which were similar to previously published models by Zwald et al [46, 47]. Parameters estimation and fertility prediction were obtained using Probit.F90 software developed by Y. M. Chang [48].

Therefore, for the definition of fertility, instead of relying only on the number of pregnant cows (verified using palpation by a veterinarian or ultrasound examination) divided by the total number of cows examined for pregnancy, we considered the outcome of each breeding event and adjusted the environmental factors such as the effects of herd-year-month, parity, cow, days in milk, sire proven status (young, proven, colored) in order to rank the bulls based on their breeding values for fertility. Further, the fertility of each bull was calculated and expressed as the percent deviation of its conception from the average conception of all bulls having at least 300 breeding in the data set.

Selection of high and low fertility bulls

For this study, we used an arbitrary threshold for classifying high and low fertility bulls. However, the bulls scoring highest and lowest fertility deviation from average with highest reliability (>1,000 breeding/bull) were selected for this study. The differences in the average fertility indexes between high and low fertility groups were 5.46% which was obtained from bulls having adequate records for higher reliability. While three bulls which were scored 5.3% above the average were considered high fertile, three bulls which were scored 10.76% below the average were defined as low fertility (see additional file 2). Two separated pools of sperm cells (3 × 108) were constituted by mixing equal amounts of sperm cells from either three low or three high fertility bulls. The experiment was replicated three times.

Isolation of pure sperm cells

Spermatozoa were collected from high and low fertility bulls and frozen in 0.25 ml straws. For each bull, the total spermatozoa collected were purified by Percoll gradient centrifugation: 90% Percoll solution in water was prepared with DL-Lactate (19 μM), CaCl2 (2 μM), NaHCO3 (25 mM), MgCl2 (400 μM), KCl (3 μM), NaH2PO4 (310 μM), NaCl (2 mM) and Hepes (10 mM). 90% Percoll solution was diluted to 45% with sperm diluent medium (1 mM pyruvate, 10 mM Hepes, 0.021 mM DL-Lactate in Tyrode's salt solution, pH 7.4). A density gradient of Percoll was prepared in an Eppendorf tube (0.1 ml of 90% fraction under 1 ml of the 45% fraction). Spermatozoa were thawed at 35°C for 1 min and layered on top of the percoll gradient. The spermatozoa were pelleted by centrifugation (956 g; 15 min) followed by two washes in phosphate-buffered solution (PBS) (956 g; 5 min,). The total sperm count was obtained using an Improved Neubauer Hemacytometer and 108 sperm cells were aliquoted and stored at -80°C.

Protein extraction by DDF

DDF sequentially extracts proteins from different cellular compartments using a series of detergents and this off-line pre-fractionation step in sample preparation increases the proteome coverage. Another advantage of using DDF is that based on the DDF fractions from which proteins are identified, proteins can be found in different cellular locations. Proteins were isolated using DDF as previously described [18]. Cytosolic proteins were extracted by six sequential incubations in a buffer containing digitonin (10 min each); next a fraction containing predominantly membrane proteins was isolated by incubating the cells in 10% Triton X-100 buffer for 30 min and then removing the soluble protein. Nuclear DDF buffer containing deoxycholate (DOC) was then added to the remaining insoluble material and subjected to freeze-thawing to disrupt the nucleus. Nuclear proteins were collected from the resulting soluble fraction and the sample was then aspirated through an 18 g needle and treated with a mixture of DNase I (50U, Invitrogen, Carlsbad CA;) and RNase A (50 mg; Sigma-Aldrich, St Louis, MO) at 37°C for 1 h) to digest nucleic acids. Any remaining pellet, containing the least soluble proteins, was treated with a buffer containing 5% SDS.

Proteomics

Proteomic analysis was carried out with triplicate samples of spermatozoa from the high fertility group and low fertility group spermatozoa as described [19]. Proteins were precipitated with 25% tricholoroacetic acid to remove salts and detergents. Protein pellets were resuspended in 0.1 M ammonium bicarbonate with 5% HPLC grade acetonitrile (ACN), reduced (5 mM, 65°C, 5 min), alkylated (iodoacetamide, 10 mM, 30°C, 30 min) and then trypsin digested until there was no visible pellet (sequencing grade modified trypsin, Promega; 1:50 w/w 37°C, 16 h). Peptides were desalted using a peptide macrotrap (Michrom BioResources, Inc., Auburn, CA) and eluted using a 0.1% trifluoroacetic acid, 95% ACN solution. Desalted peptides were dried in a vacuum centrifuge and resuspended in 20 μL of 0.1% formic acid and 5% ACN. LC analysis was accomplished by strong cation exchange(SCX) followed by reverse phase liquid chromatography (RP-LC) coupled directly in line with an ESI ion trap mass spectrometer (LCQ Deca XP Plus; ThermoElectron Corporation; San Jose, CA). Samples were loaded into a LC gradient ion exchange system (Thermo Separations P4000 quaternary gradient pump coupled with a 0.32 × 100 mm BioBasic strong cation exchange column). A flow rate of 3 μL/min was used for both SCX and RP columns.

A salt gradient was applied in steps of 0, 10, 15, 20, 25, 30, 35, 40, 45, 50, 57, 64, 90, and 700 mM ammonium acetate in 5% ACN, 0.1% formic acid, and the resultant peptides were loaded directly into the sample loop of a 0.18 × 100 mm BioBasic C18 reverse phase liquid chromatography column of a Proteome X workstation (ThermoElectron). The reverse phase gradient used 0.1% formic acid in ACN and increased the ACN concentration in a linear gradient from 5% to 30% in 20 min and then 30% to 95% in 7 min, followed by 5% for 10 min for 0, 10, 15, 25, 30, 45, 64, 90, and 700 mM salt gradient steps. For 20, 35, 40, 50 and 57 mM salt gradient steps ACN concentration was increased in a linear gradient from 5% to 40% in 65 min 95% for 15 min and 5% for 20 min.

The mass spectrometer was configured to optimize the duty cycle length with the quality of data acquired by alternating between a single full MS scan followed by three tandem MS scans on the three most intense precursor masses (as determined by Xcalibur software in real time) from the full scan. The collision energy was normalized to 35%. Dynamic mass exclusion windows were set at 2 min, and all of the spectra were measured with an overall mass/charge (m/z) ratio range of 300–1700.

All searches were done using TurboSEQUEST™ (Bioworks Browser 3.2; ThermoElectron). Mass spectra and tandem mass spectra were searched against an in silico trypsin-digested database of bovine RefSeq proteins downloaded from the National Center for Biotechnology Institute [NCBI; 12/26/2006; 24,853 entries]. Trypsin digestion including mass changes due to cysteine carbamidomethylation (C, 57.02 Da) and methionine mono- and di-oxidation (15.99 Da and 32 Da), was included in the search criteria. The peptide (MS precursor ion) mass tolerance was set to 1.5 Da and the fragment ion (MS2) mass tolerance was set to 1.0 Da. Rsp Value less than 5.

As a primary filter we first limited our Sequest search output to include only peptides ≥ 6 amino acids long, with ΔCn ≥ 0.08 and Sequest cross correlation (Xcorr) scores of 1.5, 2.0 and 2.5 for +1, +2, and +3 charge states, respectively. We next used a decoy database search strategy [49] (using the same primary filter for the real database search) to calculate P values for peptide identifications as this allows us to assign the probability of a false identification based on the real data from the experiment itself [4952]. Since the accuracy of peptide identification depends on the charge state we calculated P values for +1, +2, and +3 charge states separately. The probability that peptide identification from the original database is really a random match (P value) is estimated based on the probability that a match against the decoy database will achieve the same Xcorr [51, 53]. Protein probabilities were calculated exactly as described [54, 55] using only peptides with a P < 0.05 and only those proteins were used for further modeling. All protein identifications and their associated MS data have been submitted to the PRoteomics IDEntifications database (PRIDE ;[56]) and PRIDE accession numbers are 1883–1888.

Differential protein expression

Label free quantification approaches design to quantify relative protein abundances directly from high throughput proteomic analyses with out labeling techniques. Here, we used ProtQuant [29], a java based tool for label free quantification that uses a spectral counting method with increased specificity (and thus decreased false positive i.e. type I errors). This increased specificity is achieved by incorporating the quantitative aspects of the Sequest cross correlation (XCorr) into the spectral counting method. ProtQuant also computes the statistical significance of differential expression of control and treatment for each protein using one-way ANOVA (α ≤ 0.05). This method requires at least 3 peptides for each protein from the combination of the control and treatment before to calculate a p-value.

Gene Ontology Annotation

We used Gene Ontology (GO) resources and tools available at AgBase [57] to identify the molecular functions and biological processes represented in differentially expressed proteins in our datasets. We used GORetriver tool to obtain all existing GO annotations available for known proteins in our datasets. We first GO-annotated differentially expressed proteins in our datasets using existing annotations from probable orthologs with ≥90% sequence identity using the UniRef 90 database. Proteins without annotation at UniRef 90, but between 70–90% sequence identities to presumptive orthologs with GO annotation were GO-annotated using GOanna tool [22]. Biological process annotations for these proteins were grouped into more generalized categories using GOSlim viewer [22].

Modeling using Ingenuity pathway analysis

To gain insights into the biological pathways and networks that are significantly represented in our proteomic datasets we used Ingenuity Pathways Analysis (IPA; Ingenuity Systems, California). Currently IPA accepts gene/protein accession numbers from human, mouse, and rats only. Therefore, to use IPA, we mapped bovine proteins from our datasets to their corresponding human orthologs by identifying reciprocal-best-BLAST hits and uploaded these accession numbers into IPA. IPA selects "focus genes" to be used for generating biological networks. Focus genes are based on proteins from our datasets that are mapped to corresponding gene objects in the Ingenuity Pathways Knowledgebase (IPKB) and are known to interact with other genes based on published, peer reviewed content in the IPKB. Based on these interactions IPA builds networks with a size of no more than 35 genes or proteins. A P-value for each network and canonical pathway is calculated according to the fit of the user's set of significant genes/proteins. IPA computes a score for each network from P-value and indicates the likelihood of the focus genes in a network being found together due to chance. We selected networks scoring ≥ 2, which have > 99% confidence of not being generated by chance [58, 59].

Biological functions are assigned to each network by using annotations from scientific literature and stored in the IPKB. Fisher exact test is used to calculate the P-value determining the probability of each biological function/disease or pathway being assigned by chance. We used P ≤ 0.05 to select highly significant biological functions and pathways represented in our proteomic datasets [58].