Introduction

While biologists generally focus on the protein-coding open reading frame (ORF) of mRNA, it is now emerging that many mRNAs, even noncoding RNAs, also possess small ORF (sORF), and have significant roles in different organisms (Chu et al. 2015). It is a consensus that the ORFs in the transcribed mRNA will be translated into corresponding proteins due to in-frame codons defined by start and end codons. However, it still remains a big challenge in the field of gene annotation to distinguish the bona fide proteins from the translation noise. Moreover, most ORF-finding algorithms have historically set 300 nucleotides as the minimum ORF size for gene annotation, which incorrectly classifies genuine proteins corresponded RNA into noncoding RNAs (ncRNAs). On account to the great development of bioinformatics and biotechnology, numerous large-scale genomic studies have identified many nonclassical protein-coding genes, previous thought to be noncoding (Aramayo and Polymenis 2017; Bazzini et al. 2014; Chew et al. 2013; Derrien et al. 2012; Ingolia et al. 2011, 2014; Makarewich and Olson 2017; Tautz 2009; Ulitsky et al. 2011). More studies find that sORFs in ncRNA can encode small peptides, often referred as small ORF-encoded peptides (SEPs) that play important roles in the fundamental biological processes and in the maintenance of cellular homeostasis in different organisms, such as yeast, plant, zebra fish, Drosophila, and mammals (Anderson et al. 2015, 2016; Bazzini et al. 2014; Cohen 2014; Hanada et al. 2013; Ingolia et al. 2011; Ji et al. 2015; Lee et al. 2015; Magny et al. 2013; Matsumoto et al. 2017; Nelson et al. 2016; Smith et al. 2014).

Serum is the most important body fluid in mammals and possesses many important but low abundant small molecular proteins, such as peptide hormones, growth factors, lymphokines, and cytokines. However, few studies have revealed the existence and bioactivity of SEPs in serum. The major challenge in serum SEPs discovery arises from its extraordinary complexity in protein composition with the addition of post-translational modifications (PTMs) and protein variability, as well as the great concentration range (more than ten orders of magnitude) (Anderson and Anderson 2002; Omenn 2007).

To characterize the existence and bioactivity of SEPs in serum, we first established a mouse SEP database. This SEP database was then merged with mouse Uniprot database and Contamination database to form Mouse Merged database (MMD) for mass spectrometry (MS) data mining in this study. On the other hand, we extracted proteins with small molecular weight in different mouse sera and subjected to Q Exactive MS detection. After data mining, we discovered 54 novel SEPs in 15 serum samples. Furthermore, we raised four antibodies for four typical SEPs and finally confirmed the existence of two SEPs at the biochemical level.

Results

Construction and verification of Mouse Merged database

To characterize the existence of SEPs in serum, a novel mouse SEP database was constructed according to the RNA transcripts released from Gencode (vM4). This database provided about half a million putative translated SEPs in mouse. This database was then combined with mouse Uniprot database and Contamination database, forming MMD (Fig. 1). In order to verify the quality of the MMD, several recently identified functional SEPs were chosen and blasted within MMD. All of them could match one list in the MMD (Table 1). For example, MOTS-c is derived from a sORF in mitochondrial DNA and regulates insulin sensitivity and metabolic homeostasis (Lee et al. 2015). MLN, a conserved skeletal muscle-specific micropeptide, is derived from a sORF in a putative long noncoding RNA (lncRNA) and regulates skeletal muscle physiology (Anderson et al. 2015). SPAR is derived from a sORF in an lncRNA and inhibits muscle regeneration (Matsumoto et al. 2017). NoBody, a novel component of the mRNA decapping complex, is derived from a sORF in an lncRNA (D’Lima et al. 2017). Together, we have successfully constructed a high-quality MMD for the following MS data mining.

Fig. 1
figure 1

Construction of Mouse Merged database. Around 110 thousand transcripts of mouse were released from Gencode (vM4). All transcripts, except the known coding transcripts, were translated to SEPs (length between 8 and 100 a.a.) by the ORF Finder program and an in-house program. The SEP database was then merged with mouse Uniprot database and Contamination database to form Mouse Merged database for mass spectrometry data mining in this study

Table 1 Verification of the MMD

ob/ob mice show severe impaired glucose tolerance

As the species and the concentration of serum proteins show vast variability, wild-type (WT) and ob/ob mice were chosen for the serum protein preparation. We thought that some serum SEPs might show different expression patterns between WT and pathological mouse model. The body weight of ob/ob mice was significant higher than that of WT as indicated by previous studies (Fig. 2A). We then verified the glucose metabolism states of these two mouse models. ob/ob mice showed severely damaged glucose tolerance (Fig. 2B, C). These results show that these two typical mouse models possess dramatically different metabolic states.

Fig. 2
figure 2

Verification of ob/ob mice. Twelve-week-old male mice were chosen for experiments in this study. A The weight of WT and ob/ob mice. B The IPGTT test for the WT and ob/ob mice. All the mice were fasting for 18 h before IPGTT. 2 g/kg glucose was injected for the IPGTT. C Area under the curve was calculated for the IPGTT. WT mice, n = 5; ob/ob mice, n = 6. Data were analyzed by Student t tests and presented as mean ± SEM. Significance, *p < 0.05; ***p < 0.001

We next extracted the serum proteins of WT and ob/ob mice according to the workflow showed in Fig. 3A. Since the putative serum SEPs might show very low abundance and high dynamics, 11 WT mouse samples and four ob/ob mouse samples were chosen for the following MS detection, respectively (Fig. 3B, C). As shown in Fig. 3B and C, all the serum samples showed clear protein staining signal in the low molecular weight range. We then sliced the gel area below 14 kDa for the following sample preparation and MS (the area between two red lines in every lane).

Fig. 3
figure 3

Working procedure for the serum SEP detection. According to the workflow for the enrichment and identification of low abundance mouse serum proteins (A), two rounds of mouse serum proteins were separated by SDS-PAGE and stained by Colloidal blue. B The first round, seven WT mice. C The second round, four WT mice and four ob/ob mice. The proteins below 14 kDa (the proteins between two red lines in every lane) were sliced for mass spectrometric analysis

MS detection discovers novel serum SEPs

The sliced gels were further processed for MS detection according to the workflow in Fig. 3A. 54 novel SEPs were detected in total from the 15 samples (Table 2). Eight SEPs were detected in more than one sample. 38 SEPs were only detected in WT mouse serum and 12 SEPs were only detected in ob/ob mouse serum (Table 2). We sequentially named the SEPs from SEP1 to SEP54 (Table 2). SEP3, SEP12, SEP33, and SEP54 were chosen for further study to confirm the accuracy of Q Exactive MS results and to verify the existence of the SEPs in serum. The MS/MS spectrums of these four SEPs were presented in Fig. 4. SEP3 was detected in Sample 1 and Sample 2, and was encoded from a sORF in processed transcript of Epha7 gene (Table 2, Fig. 4A). Besides, SEP3 was conserved in mammals (Fig. 5A). SEP12 was detected in five samples and was encoded from a sORF in processed_transcript of Ufsp2 gene (Table 2, Fig. 4B). SEP12 was also conserved in mammals (Fig. 5A). SEP33 was detected in four samples and was encoded from a sORF in retained_intron of Tnnt2 gene (Table 2, Fig. 4C). SEP54 was detected only once with high X correlation score in Sample 15 and was encoded from a sORF in lncRNA Gm2670 (Table 2, Fig. 4D). All of the four primary MS results strongly suggested the detection of targeted peptides. Taken together, these lines of evidence suggest that SEPs widely exist in the serum and might show wide individual differences.

Table 2 MS identification of sliced bands
Fig. 4
figure 4

MS/MS spectrum of the four example peptides. The matched fragment ions of precursor ions were listed in the right of MS/MS spectra. All the matched ions were labeled with different colors, b-ions were labeled with red color, y-ions were labeled with blue color. The sequences below the spectra were the corresponding full length SEPs according to the Mouse Merged database. Red highlights represent the detected peptide fragments. A The spectrum result of SEP3. B The spectrum result of SEP12. C The spectrum result of SEP33. D The spectrum result of SEP54

Fig. 5
figure 5

SEP3 and SEP12 are conserved in mammals. Conservation analysis of SEP3 (A) and SEP12 (B) with clustal multiple alignment in six species

Western blot results confirm the existence of serum SEPs

In order to further confirm the existence of the SEPs, four antibodies were raised to against SEP3, SEP12, SEP33, and SEP54. The antigens were designed as indicated in materials and methods. The sera from the immunized rabbits were used as the antibodies to detect the corresponding SEPs in mouse sera by Western blot with human serum as control. Consistent with the MS results, SEP3 antibody recognized an 8-kDa protein in WT mouse serum (Fig. 6A), rather than that of human and ob/ob mouse. Similarly, SEP54 antibody recognized a 10-kDa protein in mouse serum (Fig. 6B), rather than that of human. Furthermore, consistent with the MS result, SEP54 showed higher concentration in ob/ob mouse. However, SEP12 and SEP33 antibodies failed to recognize any specific band in all of the serum samples (data not shown). Altogether, these results further demonstrate the existence of SEPs in serum with different expression levels.

Fig. 6
figure 6

WB verification of SEPs in mouse serum. Polyclonal antibodies for four SEPs were raised in rabbits. Two antibodies showed specific bands in the low molecular weight area of mouse serum samples. A Anti-SEP3 antibody recognized a target protein in around 8 kDa, indicated by the red arrow. B Anti-SEP54 antibody recognized a target protein in around 10 kDa, indicated by the red arrow

Discussion

In this study, we constructed a novel SEP database and discovered some SEPs in serum by MS. Our data provided two key insights into the genome-wide expression of SEPs in mammals. First, SEPs were widely distributed and translated from a large body of transcripts. We annotated hundreds of thousands of SEPs (length ranging from 8 to 100 a.a.) according to the noncoding transcripts in mouse GENCODE (vM4), and validated 54 novel SEPs in the mouse serum (Table 2). This was the first systematic study to explore the existence of SEPs in serum. Previous studies have successfully used computational approach and ribosome profiling to define the transcripts in translating ribosome (Bazzini et al. 2014; Chew et al. 2013; Ingolia et al. 2009, 2011; Menschaert et al. 2013) and identified some SEPs in specific tissues and cell lines. However, it has been strongly argued whether the RNA fragments protected by the ribosome always reflect the actively translated transcripts. The RNA bound to RNA-binding proteins will also be improperly classified as coding sequence, as well as the ribosome randomly bound RNA. In consistent with the known serum small peptides, the putative serum SEPs might also be low abundant, highly dynamic, and low molecular weight. Therefore, a high-precision mass spectrometer and a homemade database were chosen to verify the existence of SEPs in the serum.

Second, the serum SEPs were very low abundant and highly dynamic among individuals. For example, ApoA2, a well-known high abundant serum peptide (Uniprot: P09813; length: 102 a.a.), was detected more than ten fragments in every sample of all of the 15 samples (data not shown). However, all of the 54 SEPs detected in this study matched only one fragment in the corresponding sample (Table 2), and only eight SEPs were repeatedly detected in more than one sample (Table 2). Besides, the low abundance of those SEPs might also be one of the reasons for the weak Western blot signal of SEP3 and SEP54 (Fig. 6). On the other hand, insulin, a protein existing in serum with nanogram level, was not detected in the 15 samples (data not shown), which suggests that the abundance of the above-detected SEPs might be higher than that of insulin and further proved the existence of those SEPs in serum. Besides, the low repeatability detection of the above 54 SEPs among 15 samples from 11 WT mice and four ob/ob mice implied, to some degree, the high dynamics of serum SEPs among individuals (with different metabolic states). These MS results were further verified by Western bolt analysis. Consistently, SEP3 antibody detected stronger signal in WT mouse serum than that in ob/ob mouse serum, in agreement with the result that SEP3 was only detected in WT mouse samples in MS results. Similarly, SEP54 antibody detected stronger signal in ob/ob mouse serum samples than that in WT mouse sample. These high dynamics of SEPs were similar with that of known small peptides in serum. For example, serum insulin level increases after feeding and serum irisin level increases after exercise (Jedrychowski et al. 2015). Besides, signal peptide prediction showed only three of the detected SEPs had the secretion signal peptide (see Supplemental table), which indicated that most SEPs tended to be secreted by uncanonical pathway, or released from broken cells.

Several approaches have been used to validate the putative SEPs (Housman and Ulitsky 2016). Ideally, the generation of antibodies against target SEPs is the most effective method (Anderson et al. 2015). However, the optimal antigen designing to the SEPs is challenging for their small size. This may be the reason why antibodies raised by SEP12 and SEP33 could not recognize specific bands in serum samples. As antigen peptides for SEP3 possessed 4 a.a. difference between human and mouse, SEP3 antibody could not recognize its ortholog in human SEP3 (Figs. 5A, 6A). Another concern for the usage of antibody is that even the highest-affinity antibody may not be sufficient to produce a strong enough signal for the detection of low abundance SEPs. Alternatively, clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR associated protein 9 (Cas9) mediated gene-editing for the target SEPs in vitro or in vivo could also provide direct evidence for the existence of the SEPs and substantially support the functional study of the SEPs. Besides, for the biological function study of serum SEPs, such as SEP3 and SEP54, the mouse tail vein injection of artificially synthesized full length SEPs will be a high-efficiency approach.

It still remains unclear how many SEPs exist in serum and what the biological functions of the serum SEPs are. New methods and new ideas are still needed to further study the SEPs in serum. Together, our study opens a new avenue for the identification of small peptides in serum, and provides an entry point to investigate their function in vivo.

Materials and methods

Animals

Twelve-week-old male WT (C57BL/6J) and ob/ob mice were housed in our animal facility on a 12-h light/dark cycle with ad libitum access to water and food. All animal protocols were approved by the Animal Care and Use Committee of the Institute of Biophysics, Chinese Academy of Sciences, SYXK (SPF 2011-0029).

IPGTT

Tail blood samples were collected from 12-week-old mice that had been fasted for 18 h, before and at 15, 30, 60, and 120 min after i.p. injection of glucose (2 g/kg). Glucose levels were measured at prespecified times. Blood glucose was measured using glucometer (ACCU-CHEK, Roche).

Mouse Merged database construction

For the construction of SEP database, both the ORF Finder program and an in-house program were used to identify ORFs from noncoding transcripts in mouse GENCODE (vM4). Ensembl transcripts (release 73) were downloaded from the Ensembl FTP repository to annotate the noncoding transcripts in mouse GENCODE. The peptide sequences associated with predicted ORFs of noncoding RNAs, ranging from 8 to 100 a.a., were selected to construct the SEP database. The SEP database was then merged with mouse Uniprot database and Contamination database to form Mouse Merged database (Supplemental data).

Serum sample preparation

The workflow for the preparation of serum samples was shown in Fig. 3A. Serum samples were collected from 12-week-old WT and ob/ob mice by removalling eyeballs. After clotting, serum was separated by centrifugation at 3000 g for 10 min at 4 °C. The low molecular weight and low abundance serum proteins were enriched with 60% acetonitrile as previous reported (Echan et al. 2005; Kay et al. 2008; Wu et al. 2010). Briefly, 100 μl serum was mixed with 300 μl H2O and 600 μl acetonitrile and placed for 30 min at 4 °C. After centrifuged at 12,000 g for 30 min at 4 °C, the supernatant was concentrated by vacuum centrifugation. The precipitate was redissolved with 100 μl H2O and processed to deglycosylation according to the instruction (NEB, USA). The deglycosylated proteins were redissolved with Sample buffer (125 mmol/L Tris Base, 20% glycerol, 4% SDS, 4% β-mercaptoethanol, and 0.04% bromophenol blue) with EDTA-free protease and phosphatase inhibitors (Thermo, USA). The protein samples were further denatured at 95 °C for 5 min.

Colloidal blue staining and mass spectrometry detection

Serum protein samples were separated on 10% Tricine-gels and subjected to Colloidal blue staining (Life Technologies, USA) (Schagger 2006). The indicated bands were cut into slices for MS detection (Fig. 3B, C). In-gel digestion of every slice was performed as previously described (Chen et al. 2016). The resulting peptide mixtures were dried and stored at −80 °C until further LC–MS/MS analysis.

LC–MS/MS analysis of serum peptide mixtures was performed on a Q Exactive mass spectrometer with a nano-electrospray ion source (Thermo, USA) coupled with an EasyLC nano HPLC system. The digested peptides were then loaded onto a C18 trap column with an autosampler, eluted onto a C18 column (100 μm × 15 cm) packed with ReproSil-Pur 130 C18-AQ 3 μm particles (Dr. Maisch HPLC GmbH, Germany).

All MS/MS spectra were acquired in a data-dependent scan mode, where one full-MS scan was followed with ten MS/MS scans. The full-scan MS spectra (300–1600 m/z) were acquired with a resolution of 60,000 at m/z 400 after accumulation to a target value of 3e6. The 20 most abundant ions found in MS1 were selected for fragmentation at a normalized collision energy of 27% (Chen et al. 2016).

The LC–MS/MS data were searched against the homemade MMD using the Proteome Discoverer 1.4 with SEQUEST as search engine (Thermo, USA). Search parameters were set as follows: enzyme: trypsin; precursor ion mass tolerance: 10 ppm; fragment ion mass tolerance: 0.02 Da. The maximum number of miss-cleavages by trypsin was set as two for peptides. The variable modification was set to oxidation of methionine. The fixed modification was set to carboxyamidomethylation of cysteine.

Signal peptide prediction

According to websites “http://phobius.sbc.su.se” and “http://www.cbs.dtu.dk/services/TargetP/,” Phobius and TargetP were used for the prediction of SEP signal peptide. The output format was based on TargetP. The SEPs were listed positively in the supplemental table only when both methods returned a positive signal peptide prediction. The final prediction was based on the scores on mTP, SP, and another. mTP was a mitochondrial targeting peptide. SP was a signal peptide for secretory pathway, which was shown as “S” in the supplemental table, and “–” was any other location. Reliability class (RC) contains five classes, in which “1” means the strongest prediction. TPlen showed the predicted presequence length.

Conservation analyses

The corresponding nucleotide sequences for SEP3, SEP12, SEP33, and SEP54 ORFs were obtained from NCBI database (https://www.ncbi.nlm.nih.gov/), respectively, as reported previously (Lee et al. 2015). BLAST search was processed to ensure correct extraction of the nucleotide sequences. The protein sequences of six species, human (Homo sapiens), chimpanzee (Pan troglodytes), swine (Homo sapiens), dog (Canis lupus familiaris), rat (Rattus norvegicus), and mouse (Mus musculus), were aligned using Clustal Multiple Alignment.

Immunoassay

The corresponding antigen peptide for SEPs was conjugated to Keyhole Limpet Hemocyanin (KLH) and injected into rabbits. The antigen information was listed here: RGRKFPQNAL for SEP3, SSKPIERSYMI for SEP12, RNKDAILEALRE for SEP33, and KAPEGAPSFGKA for SEP54. IgG purified sera were used for the detection of serum SEPs by Western blot.

For Western blot, serum protein samples were prepared in Sample buffer with EDTA-free protease and phosphatase inhibitors (Thermo, USA), heated at 95 °C for 5 min, ran on a 10% Tricine-gels and transferred to 0.4 μm PVDF membranes (Merck, Germany) at 100 mA for 30 min. Membranes were blocked with 5% nonfat dry milk for 1 h at room temperature (RT) and incubated with primary antibody (1:500–1:2,000 dilution) overnight at 4 °C, followed by secondary HRP-conjugated antibodies (1:10,000) for 1 h at RT. Chemiluminescence was detected and imaged using ECL (PerkinElmer Life Sciences, Waltham, MA).

Statistical analyses

Data were presented as mean ± SEM unless specifically indicated. The statistical analyses were performed using GraphPad Prism 6. Comparisons of significance between groups were performed using Student t tests as indicated.