Background

Sperm storage by females is widespread throughout the animal kingdom [1, 2] but amazingly little is known about how females are able to keep sperm cells viable over prolonged periods of time. In many species, females provide specialized morphological structures for sperm storage often known as spermathecae [3]. Females 'interact' with and 'sustain' sperm that are stored in these structures through glandular secretions, produced, for example, by the spermathecal glands [4]. These secretions contain proteins, metabolites and other chemicals in the honeybee Apis mellifera [5] and spermathecal fluid has recently been shown to maintain sperm viability [6, 7]. Several proteins have been proposed to be responsible for this effect, such as the glycolytic enzyme triosphosphate isomerase [5] and a number of antioxidant defense enzymes [8]. In addition, high K+ concentrations and the high pH of the spermathecal fluid have been proposed to lower the metabolic rate of sperm in storage [5, 9, 10]. However, despite the spermatheca containing 5 to 10 mg of protein/ml [5], no systematic analysis of these female derived proteins has so far been conducted. As a consequence, our knowledge about the biochemical and physiological mechanisms that maintain sperm viability or the physiological costs associated with sperm storage are extremely limited [11]. Furthermore, females have been hypothesized to bias paternity outcomes by manipulating sperm in storage [12]. Consequently, sexual selection [13] may influence the female contributions towards stored sperm as well.

The study of male contributions towards sperm, such as seminal fluids or male accessory gland secretions, has received much more attention [1416]. Males transfer a complex mixture of components to the female along with sperm [13, 1721], which have multiple effects on sperm viability or female physiology [6, 7] but some of these components also seem to be agents of sexual conflict [2225]. It seems reasonable to assume that females have also evolved a complementary arsenal of components to support and manipulate sperm. This makes detailed studies of female sperm storage physiology and its interactions with sperm and/or seminal fluid timely. A crucial step to understand female influence on stored sperm is to identify the components provided by the female, and proteomic technologies offer the opportunity to investigate the female's arsenal.

Social hymenopteran insects (the bees, ants and wasps) are interesting model systems to study sperm storage by females because several species have taken sperm storage to spectacular extremes [11, 26, 27]. This can be seen in terms of both the total number of sperm stored as well as the efficiency by which sperm are kept alive over prolonged periods of time [28]. A phenomenon common to many social hymenopteran insects is that queens only copulate during a brief period early in life [16, 29, 30]. In the absence of re-mating later in life, queens acquire and store a lifetime supply of sperm that often fixes the upper limit of a colony's size, longevity and fitness. Apart from the total number of initially stored sperm, queen lifetime fecundity is also influenced by her efficiency to keep sperm viable. Some social insect queens can not only live for several decades [26, 31], but they also maintain colonies of several million workers [11, 30, 32]. Selection is therefore expected to have maximized storage efficiency of sperm number [28] and sperm survival and minimized sperm number used per egg fertilization. Sperm storage induces costs for the female that are known to trade off with other female life history traits in leaf cutter ants [11] and bumblebees [33]. Finally, in polyandrous species, ejaculates of several males can coexist within the spermatheca for years, but it remains to be investigated whether sperm competition or cryptic female choice occurs whilst sperm is in storage [29].

We have used the honeybee, A. mellifera, and present a proteomic identification of the female's contribution towards sperm by identifying proteins that females provide to sperm in storage. Honeybee queens are efficient sperm storers that initially store around 6 million sperm for up to 7 years, giving them an estimated potential to sire up to 1.7 million offspring (see [29] for a review on the honeybee mating system). Consequently, spermathecal fluid components are expected to maximize the survival of large numbers of sperm. Furthermore, honeybee queens are highly polyandrous and store sperm from several males. Consequently, females could use sperm storage to manipulate sperm and, thus, manipulate paternity success. An additional advantage of honeybees as a model system is that the availability of the honeybee genome sequence [34] allows the use of tandem mass spectrometry (MS/MS) to identify proteins [19, 20, 35]. We here identify the spermathecal fluid proteome of honeybee and compare it to recently published proteomic profiles of sperm and seminal fluid [19, 20] in order to understand the specific female contribution to sperm in storage.

Results

The proteins of spermathecal fluid collected from dissected spermathecae were separated by one-dimensional SDS-PAGE (Figure 1). We compared this profile to extracted spermathecal wall proteins, hemolymph and sperm. In each case the protein profiles were distinct, showing that separation of these protein subsets could be achieved by our dissection and extraction protocols ([19] and data not shown). Protein profiles of spermathecal fluid were visually inspected on a total of 11 one-dimensional gels using 12 independent biological replicates for mated and 4 independent biological replicates for virgin queens. We found that specific protein profiles for spermathecal fluid can be consistently reproduced (Figure 1), in both technical and biological replicates and resemble those found in earlier studies [5, 7]. Modifications of our standardized extraction protocol resulted in no obvious abundance changes of protein profiles on the gels, indicating that our collection method is a reliable way to sample spermathecal fluid. We found a large overlap in the spermathecal fluid protein band profiles of mated and virgin queens (Figure 2). Furthermore, the protein profile of the spermathecal gland secretions is very similar to that of the spermathecal fluid, both for mated and virgin queens (Figure 2). The protein profiles of spermathecal fluid were very different from that of seminal fluid isolated from male ejaculates (Figure 2).

Figure 1
figure 1

SDS-PAGE gel separation of spermathecal fluid proteins. A colloidal Coomassie blue stained gel showing a representative protein profile of spermathecal fluid. A total of 50 μl of spermathecal fluid (SF) extract was loaded on the gel. Thirty-four protein bands, as indicated by arrows, were excised for protein identification. An overview of significant protein identifications for these bands is given in Additional data file 1.

Figure 2
figure 2

Spermathecal gland and spermathecal fluid proteins in mated and virgin queens. Colloidal Coomassie blue stained gel lanes showing representative protein profiles of spermathecal fluid and spermathecal gland secretions from virgin and mated queens and seminal fluid. A total of 8 μl of fluid extracts from the spermathecal samples and 16 μl of the seminal fluid sample were loaded on the gels.

To identify the most abundant proteins present in the spermathecal fluid, we ran a total of four mass spectrometry analyses from four independent biological samples. Two sets of analyses were performed, one based on in-gel digested bands of one-dimensional SDS-PAGE (Figure 1) and a second based on liquid chromatography (LC)-MS/MS analysis of total protein tryptic digests. The latter were nested experiments each consisting of six LC-MS/MS experiments performed in series, with the peptides identified in each run excluded from the subsequent analysis to improve the depth of analysis (see Materials and methods).

A summary of all significant protein identifications is given in Table 1 (protein match data are presented in Additional data file 1). Our final analysis resulted in the identification of 122 different proteins across the four spermathecal fluid samples. This set of proteins included molecular chaperones, an array of enzymes involved in energy and amino acid metabolism, antioxidant enzymes, proteins involved in signaling pathways, structural proteins, and a range of proteins with unknown functions (Table 2).

Table 1 Proteins in honeybee queen spermathecal fluid
Table 2 Proteins in honeybee queen spermathecal fluid

We compared our list of 122 spermathecal proteins with the reported abundant proteins from bee sperm samples [19]; we found that only 10 (8%) proteins were detected in both the spermathecal fluid and this list of sperm proteins (Figure 3; Additional data file 1). We also detected five of these ten sperm proteins in the spermathecal fluid of virgin queens, so it is unlikely that these are contaminating sperm proteins but instead represent the expression of the same gene that queens secrete into the spermathecal fluid. Only 5 (4%) proteins were found in sperm samples in our previous publication from male ejaculates and also in the spermathecal fluid list from mated queens presented here (Figure 3). Comparison of the spermathecal list with the top 12 most abundant hemolymph proteins we have previously detected by mass spectrometry [19] also revealed no overlap. We have also compared the protein profiles of spermathecal fluid identified here and our previous analysis of seminal fluid [19] and again found substantial differences. Only 19 (16%) out of the set of 122 spermathecal proteins were also detected in this previously reported seminal fluid proteome. Sixteen of this set of 19 proteins were also present in the spermathecal fluid of virgin queens and, thus, cannot be considered as contaminants from male seminal fluid (Table 1; Additional data file 1). This provides evidence that while qualitative assessment of seminal fluid contamination in our spermathecal fluid samples was minimal at the depth of the analysis performed, some identical proteins are present, which appear to be expressed and secreted by both males into their ejaculate and by females into the spermatheca. Our dataset of 122 proteins also allowed a comparison of the spermathecal protein population of virgin and mated queens. We detected peptides for 61 proteins present in both virgin and mated queens (Figure 3), but each group also had unique sets of proteins not found in the other. We found that 38 (30%) spermathecal fluid proteins were only detected in mated queens and 23 (19%) proteins were only detected in virgin queens. Obviously, protein profiles differ between young, virgin and old inseminated queens, but our study was not able to distinguish whether this proteomic changes are caused by queen age or mating status. Future work will be needed to resolve this issue; however, aged virgin females are physiologically and technically extremely difficult to obtain to test this issue.

Figure 3
figure 3

Spermathecal fluid proteins in virgin and mated queens. A graphical comparison of the spermathecal proteins detected in our study. The black bars show the total number of proteins that were detected in both virgin and mated queens as well as the number of proteins detected in virgins or mated queens only. The number of spermathecal proteins that were also found in seminal fluid and sperm are shown by grey and white bars, respectively. Half of the spermathecal proteins (50%) were found in mated as well as virgin queens, although subsets of proteins were unique for mated (30%) and virgin queens (20%). Overlaps of spermathecal proteins with those reported for sperm and seminal fluid [19] were generally low and are shown by the grey and white bars, respectively.

Spectral counts in our LC-MS/MS data from spermathecal fluid revealed that counts for particular proteins were sometimes substantially different between mated and virgin queens (Additional data file 2). This indicates that the protein concentrations might substantially differ between spermathecal fluid of mated and virgin queens. Future work is obviously needed to quantify the proteins with different spectral counts. To do this, biological replicates of spectral counts based on LC-MS/MS will be necessary, but were beyond the scope of the current study.

To further explore the metabolic network established in the spermathecal fluid, we created metabolic networks of spermathecal fluid and seminal fluid using data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [36, 37] associated with our identified proteins. This was then visualized with the Cytoscape software package [38]. The resulting networks are presented in Figure 4 (see also an annotated version provided as Additional data file 3), where colored nodes (rounded squares) represent enzymes in different functional categories, metabolites are shown as small grey circles, while the reaction is shown as connecting lines between the enzymes and metabolite nodes. The two networks differ in their degree of connectivity and the number of hubs that join multiple reactions. In the seminal fluid network there are discrete metabolic reactions leading to six clusters of reactions plus the redox reaction of disulfide isomerase. This is consistent with sperm needing only to survive for a short period in seminal fluid and the substrates necessary for these reactions being pre-charged in seminal fluid prior to ejaculation. In contrast, the spermathecal fluid is a well-connected single metabolic entity. It contains 5 of the 14 enzyme nodes present in the seminal fluid, but also an extra 23 enzyme nodes that combine the 6 clusters in the seminal fluid into a single metabolic network. Obviously, the different metabolic steps are interlinked with many products representing the substrates for other reactions. This correlates with the requirement of spermathecal fluid to maintain homeostatic functions for years, perhaps with only a small set of entry metabolites. The terminal metabolite nodes of the network are potential substrates to be transported in or out of the spermatheca, across the spermathecal wall.

Figure 4
figure 4

Metabolic networks of seminal and spermatecal fluid. Visualization of spermathecal and seminal fluid metabolic networks based on the proteins identified in this study and Baer et al. [19]. Colored nodes (rounded squares) represent enzymes in different functional categories, metabolites are shown as small grey circles, and reactions are shown as connecting lines between the enzyme and metabolite nodes. Additional data file 3 provides Enzyme Commission (EC) numbers and metabolite names for all features that are noted. The seven enzymes in common between the two datasets are highlighted by increased size, and red outlines indicate nodes with a consistent spatial arrangement in both networks.

The spermathecal network shows the key features of biochemistry needed for sperm protection and maintenance. It shows a near complete glycolytic pathway that is absent from the seminal fluid and a large series of components for a vacuolar like protein pumping ATPase. It also contains a variety of antioxidant defenses, most shared with seminal fluid proteins, although it is often different gene products that catalyze the same reactions. These three networks interact through common ATP/ADP and NAD(P)/NAD(P)H pools (Additional data file 3).

Protons (H+) are presented as metabolites here and are heavily connected nodes (Additional data file 3); we kept these in the network given that metabolic maintenance of pH may be an important function in spermathecal fluid [39]. However, removal of this 'currency metabolite' [40] does not significantly break the highly interconnected structure of the spermathecal fluid network, but it does further fragment the seminal fluid network (data not shown).

Discussion

The first large-scale identification of proteins that are present in the spermathecal fluid of honeybee queens is an essential step in uncovering the molecular regulation of long-term sperm storage. A comparison of identified protein lists between our spermathecal fluid samples and those from sperm and hemolymph revealed surprisingly little overlap. Our analysis of spermathecal fluid of virgin queens, which could not have been contaminated with sperm proteins, allowed us to further decrease the number of possible sperm contaminants to only five proteins that we subsequently removed from our final list to avoid any form of contamination from stored sperm. The detection of these remaining sperm proteins in spermathecal fluid does not necessarily result from contamination, as proteins might be expressed in both locations in vivo. Information about the proteineous contributions of females towards stored sperm is still very limited. An expressed sequence tag analysis in Drosophila detected 42 transcripts that are enriched for expression in the spermatheca [41] but we noted that only 3 proteins within the honeybee spermathecal proteome list had significant sequence similarity to proteins predicted from these Drosophila transcripts. A set of 19 genes highly expressed in spermathecae were identified during analysis of the Hr39 gene in Drosophila, which is reported to regulate Drosophila female reproductive tract development and function [42]. While there are orthologs for most of these proteins in Apis, only one of the Drosophila genes highly expressed in spermathecae (Hsc70-4) has orthologs in our protein set. These orthologs are among the heat shock protein molecular chaperones (Table 2). Recently released microarray analysis of virgin and mated spermatheca from Drosophila [43, 44] reveals a large number of spermatheca enriched transcripts. Sequence comparison with the Apis spermathecal proteins in Table 2 reveal that approximately 47% of the corresponding genes in Drosophila have significant spermatheca-enriched expression patterns, while a further 30% have significantly spermatheca-depleted expression patterns (Additional data file 4).

The spermathecal fluid proteins of the honeybee differ substantially from those we have reported in seminal fluid [19], supporting the idea that selection on seminal and spermathecal fluid were substantially different. Seminal fluid was selected to increase insemination and paternity success whereas spermathecal fluid evolved to maximize sperm survival. Nevertheless, we were surprised by the finding of a small 20% overlap between these two protein sets (Figure 2 and Table 1) given that seminal fluid and spermathecal fluid are expected to also share common roles, such as keeping sperm alive, reducing oxidative stress, nourishing sperm or protecting sperm from microbial attacks. The network analysis shows that while different proteins are involved, many biochemical classes and enzymatic functions are the same in both fluids. Indeed, previous research in ants [6] and honeybees [7] shows that both spermathecal fluid and seminal fluid keep sperm viable, but we here show that the specific proteins to achieve this differ substantially between the male and female. Sperm is obviously able to survive in both of these 'habitats' but it might have to undergo developmental changes at the beginning of its storage to achieve this. Our finding that spermathecal fluid of virgins, which are anticipating freshly ejaculated sperm to arrive in the spermatheca, differs, in part, from that in mated queens (Figure 3), where sperm has been stored for several months, supports this idea. Consequently, the sperm storage process might be more complicated than assumed so far, and may involve a period of adjustment when the female partially mimics the seminal fluid environment but then modifies the conditions. This may minimize the energetic costs of sperm storage over time or select for specific sperm traits and thereby manipulates the paternity success of her mates.

Some of the components of the spermathecal fluid are likely linked to the need for protection of the sperm from damaging infections or damaging chemical substances that might be detrimental to long term storage. For example, several chitinases were found that might be used in defense for degrading fungal cell walls [RefSeq Gi 66514614, 110760993, 66511507]. We also found an elaborate antioxidant defense system of nine different enzymes, including defenses against superoxide, hydrogen peroxide and lipid peroxides, that likely help prevent oxidative damage to sperm during their substantial hiatus. This is consistent with the evidence of a high activity of several antioxidant defense enzymes in spermathecal fluid [8]. Also, we found a number of chelating proteins, several with roles in Fe2+ binding, which again may represent an antioxidant defense by preventing metal-catalyzed reactive oxygen species production and/or a scavenging of metals to prevent their use in the growth and proliferation of bacterial or fungal infections.

The most prominent aspect of the spermathecal metabolic network is glycolytic metabolism, which is a pathway for fructose degradation to organic acids and the production of both NADH and ATP (Figure 4). NADH will be needed to fuel the antioxidant enzymes noted above. ATP from this extracellular glycolysis could be used to fuel the vacuolar-like ATPase (Table 1). In many animal cell types, such an ATPase normally hydrolyzes cellular ATP and is used to pump protons out of cells, leading to raising of cellular pH and activation of K+ influx channels that replace the expelled H+ with K+ [45, 46]. The long established basic pH and high K+ concentration in the spermatheca that has been hypothesized to slow sperm metabolic rate [47, 48] could be catalyzed by such an ATPase pump activity. However, to our knowledge, such pumps have not been reported to operate in the direction required here, raising extracellular pH, so the link between vacuolar-like ATPases and the spermathecal pH and K+ concentrations requires more research.

An intriguing possibility is that this glycolytic pathway is also feeding carbon substrates to the sperm to maintain their own internal metabolism. Fructose as a carbon source seems to be of specific importance for honeybees [49] and the dominance of gycolytic pathway proteins in male reproductive organs has been reported earlier [20]. Klenk et al. [5] previously identified the glycolytic enzyme triosephosphate isomerase as a mating enhanced component of the honeybee spermathecal fluid. Together, our evidence is significant for an extracellular glycolytic pathway operating in the spermathecal fluid. This could suggest a change in primary carbon substrate for sperm, because in seminal fluid they are fueled by their own internal energy stores. This switch in substrates may be critical in establishing a new, slower metabolic rate required for long-term homeostasis in the spermatheca.

Conclusions

Our large-scale identification of proteins within the spermathecal fluid of honeybee queens offers an intriguing insight into the details of female sperm storage. Our data indicate that females provide stored sperm with a complex mixture of proteins that form a metabolically connected network. They also suggest that some essential physiological requirements of sperm have effectively been 'outsourced' and are now provided by the female. In this respect, sperm storage could be regarded as a specialized from of endosymbiosis between males and females, post-copulation but pre-fertilization.

Materials and methods

Sample preparation

Spermathecal fluid was collected by dissecting virgin and mated queens using a Leica stereo microscope at 40× to 62× magnification. All dissections were performed with fine watchmaker forceps (INOX 5, Biology) and in Hayes solution (9.0 g/l NaCl, 0.2 g/l CaCl2, 0.2 g/l KCl, 0.1 g/l NaHCO3, pH 8.7). Spermathecal fluid was sampled from a total of 206 mated and 64 virgin queens. Mated queens were egg laying mother queens at least 9 months of age and were provided by several local beekeepers. Virgin queens were obtained by grafting and used at an average age of 6 days, being the age when queens typically perform their nuptial flights. To sample spermathecal fluid, queens were briefly anesthetized in CO2 for 20 to 30 seconds after which their spermathecae were immediately dissected and transferred to a drop of Hayes solution. The dense tracheal network surrounding the spermatheca was carefully removed. The spermatheca was then washed in a second drop of Hayes to minimize contamination by hemolymph. The spermatheca was then placed on a microscopic slide. After the removal of remaining Hayes an injection needle was used to pierce a small whole into the spermathecal wall. The spermathecal fluid was then collected out of the lumen using a fine glass capillary. For each biological sample we pooled samples from 20 to 30 queens. For the samples from mated queens spermathecal fluid was separated from the surrounding stored sperm by centrifugation for 25 minutes at 850 × g at 4°C. The supernatant (spermathecal fluid) was collected and centrifuged again at 18,620 × g for 10 minutes at 4°C to remove remaining sperm. Samples from virgin queens were briefly centrifuged at 10,000 × g but not processed any further and all spermathecal fluid samples were frozen at -80°C prior to further analyses. To collect secretions of the spermathecal glands, we collected up to 20 glands for each biological sample and kept them in 50 μl of Hayes on ice. The glands were then carefully opened at their distal end using watchmaker forceps to allow the gland content to dissolve into the surrounding solution. Separation of the gland tissue from the dissolved gland secretions was done by centrifugation for 20 minutes at 850 × g and at 4°C.

Protein profiling using gel electrophoresis

Profiling of spermathecal fluid proteins was performed by SDS-PAGE using either Biorad Criterion precast gels (10 to 20% (w/v) acrylamide, HCl, 1 mm, 18 comb) or larger 12% (w/v) acrylamide homemade slab gels (Hercules, CA, USA). Gels were run at 30 mA, fixed in fixing solution (40% methanol, 10% acetic acid) for an hour and stained overnight with colloidal Coomassie blue (G 250). Gels were kept in 0.5% (v/v) phosphoric acid at 4°C prior to protein identifications using peptide mass spectrometry.

Identification of proteins from gels using tandem mass spectrometry

Colloidal Coomassie blue stained protein spots were cut from gels and destained twice in 10 mM Na2HCO3 with 50% (v/v) acetonitrile. Samples where dried at 50°C before being rehydrated with 15 μl of digestion solution (10 mM NH4CO3 with 12.5 μg/ml trypsin (Invitrogen, Carlsbad, CA, USA) and 0.01% (v/v) trifluoroacetic acid) and incubated over night at 37°C. Peptides produced from trypsinization were twice extracted from gel plugs using 15 μl acetonitrile. The supernatant was then collected and plugs washed twice with 15 μl of 50% (v/v) acetonitrile and 5% (v/v) formic acid and combined with initial supernatant. The pooled extracts were dried by vacuum centrifugation and stored at 4°C before being analyzed by mass spectrometry.

Gel spot protein identifications

Samples from excised gel pieces were analyzed on an Agilent XCT Ultra IonTrap mass spectrometer with an electrospray ionization (ESI) source equipped with a low flow nebuliser in positive mode and controlled by Chemstation (rev. B.01.03 [204]; Agilent Technologies, Santa Clara, CA, USA) and MSD Trap Control software version 6.1 (Bruker Daltonik GmbH, Bremen, Germany). Peptides were eluted from a self-packed Microsorb (Varian Inc., Palo Alto, CA, USA) C18 (5 μm, 100 Å) reverse phase column (0.5 × 50 mm) using an Agilent Technologies 1100 series capillary liquid chromatography system at 10 μl/minute using a 9 minute acetonitrile gradient (5 to 60% (v/v)) in 0.1% (v/v) formic acid at a regulated temperature of 50°C. The method used for initial ion detection utilized a mass range of 200 to 1,400 m/z with scan mode set to 'standard' (8,100 m/z per second) and ion charge control conditions set at 250,000 and 3 averages taken per scan. Smart mode parameter settings were employed using a target of 800 m/z, a compound stability factor of 90%, a trap drive level of 80% and optimize set to 'normal'. Ions were selected for MS/MS after reaching an intensity of 80,000 cps and two precursor ions were selected from the initial mass spectrometry scan. MS/MS conditions employed SmartFrag for ion fragmentation, a scan range of 70 to 2,200 m/z using an average of 3 scans, the exclusion of singly charged ions option and ion charge control conditions set to 200,000 in Ultra scan mode (26,000 m/z per second). Resulting MS/MS spectra were exported from the DataAnalysis for LC/MSD Trap version 3.3 (build 149) software package (Bruker Daltonik GmbH) using default parameters for AutoMS(n) and compound 'export'. The resulting .mgf files were then searched as outlined below.

Whole lysate protein identifications

Spermathecal fluid proteins of mated as well as virgin queens were also analyzed with a non-gel approach, using complex mixture LC-MS/MS analysis. Spermathecal samples were digested overnight at 37°C with trypsin and insoluble components were removed by centrifugation at 20,000 × g for 10 minutes. Samples were analyzed on an Agilent 6510 triple quadrupole mass spectrometer (Q-TOF) mass spectrometer with an HPLC Chip Cube source. The chip consisted of a 40 nl enrichment column (Zorbax 300SB-C18 5 u) and a 150 mm separation column (Zorbax 300SB-C18 5 u) driven by Agilent Technologies 1100 series nano/capillary liquid chromatography system. Both systems were controlled by MassHunter Workstation Data Acquisition for Q-TOF (version B.01.02, build 65.4, Patches 1,2,3,4; Agilent Technologies). Peptides were loaded onto the trapping column at 4 μl min-1 in 5% (v/v) acetonitrile and 0.1% (v/v) formic acid with the chip switched to enrichment and using the capillary pump. The chip was then switched to separation and peptides eluted during a 1 h gradient (5% acetonitrile to 40% acetonitrile) directly into the mass spectrometer. The mass spectrometer was run in positive ion mode and scans run over a range of 275 to 1,500 m/z and at 4 spectra s-1. Precursor ions were selected for auto MS/MS at an absolute threshold of 500 and a relative threshold of 0.01, with a maximum of 3 precursors per cycle, and active exclusion set at 2 spectra and released after 1 minute. Precursor charge-state selection and preference was set to 2+ and then 3+ and precursors selected by charge then abundance. Resulting MS/MS spectra were opened in MassHunter Workstation Qualitative Analysis (version B.01.02, build 1.2.122.1, Patches 3; Agilent Technologies) and MS/MS compounds detected by 'Find Auto MS/MS' using default settings. The resulting compounds were then exported as mzdata files that were then searched as outlined below.

Database searching

Mass spectra output files were analyzed against the predicted A. mellifera peptide set (PreRelease2, 11,069 sequences; 5,989,390 residues) from BeeBase [50] using the Mascot search engine version 2.2.03 (Matrix Science, Boston, MA, USA). Gel spot searches were conducted using the Mascot search engine version 2.2.03 (Matrix Science) utilizing error tolerances of ± 1.2 Da for MS and ± 0.6 Da for MS/MS, 'Max missed cleavages' set to 1, the Oxidation (M) variable modifications and the instrument set to ESI-TRAP and peptide charge set at '2+ and 3+'. Results were filtered using 'Standard scoring', 'Max. number of hits' set to 20, 'Significance threshold' at P < 0.05. Complex lysate searches were conducted using the Mascot search engine version 2.2.03 (Matrix Science) utilizing error tolerances of ± 100 ppm for MS and ± 0.5 Da for MS/MS, 'Max missed cleavages' set to 1, the Oxidation (M) variable modifications and the instrument set to ESI-Q-TOF and peptide charge set at 2+ and 3+. Results were filtered using 'MUDPIT scoring', 'Max. number of hits' set to 20, 'Significance threshold' at P < 0.05. Lists of the spermathecal fluid protein sets identified for the various samples and scores for matches are provided as Additional data file 1. To build the protein list, we applied conservative approaches to minimize false positives. Protein matches were only claimed if at least two distinct peptides were detected per protein, and MOWSE (molecular weight search) scores being higher than 50 (P < 0.05 significance level is a score >37). False discovery rate analysis of the trypsin digested spermathecal fluid samples from virgin and mated queens against a decoy randomized A. mellifera protein set (PreRelease2, 11,069 sequences; 5,989,390 residues) revealed a <2.5% false discovery rate for the virgin queen sample and a <2.5% false discovery rate for the sample from mated queens.

Each protein sequence identified from the Apis protein set was submitted to a BLAST search to identify homologous proteins from insects and other organisms. This process was used to confirm or modify the functional annotation of the proteins from the PreRelease2 dataset, and then each protein was placed into a functional category according to its annotation and manual literature searches where necessary.

Network analysis and visualization

From the KEGG database [36, 37] of biochemical pathways, proteins identified in the present study (Table 1) and [19] were associated with unique ID 'dame' entries specific to A. mellifera enzymes. Following this step, enzyme commission (EC) numbers, enzyme names and reactions associated with these KEGG IDs, where these exist, were extracted with a Perl script from the 'enzyme' file downloaded from the KEGG ftp site [51]. Proteins for which no EC number could be assigned typically have unknown function or are responsible for non-enzymatic processes. A total of 41 of the honey bee proteins (using the PreRelease2 accession numbers) in our spermathecal set shown in Table 1 were assigned EC numbers in this manner, making a non-redundant set of 33 enzyme nodes and 70 metabolites. Similarly, seminal fluid proteins from [19] yielded a non-redundant set of 16 enzyme nodes and 47 metabolites.

After the recovery of these data, the set of unique EC numbers and biochemical reactions was parsed to generate a simple interaction format (SIF) file to represent a metabolic network. The SIF file and other data, such as GB codes associated with EC numbers, enzyme names, and node types (enzyme or metabolite), were inputted into the Cytoscape software (version 2.6.0) [32] for network visualization and analysis. Network images were exported from Cytoscape as .svg files, imported into Adobe Illustrator and modified visually for presentation purposes.

Additional data files

The following additional data are available with the online version of this paper: a table showing identification of proteins in honeybee spermathecal fluid by MS/MS analysis of two one-dimensional gels and two gel-free analyses of tryptic peptides (Additional data file 1); a table listing peptide counts from mated and virgin spermathecal fluid using tandem MS analysis of gel-free analyses of tryptic peptides (Additional data file 2); a figure illustrating the metabolic network of seminal and spermatecal fluid (Additional data file 3); a table listing abundances of Drosophila transcripts with sequence similarity to the proteins found in Apis spermatheca (Additional data file 4).