1 Introduction

Identification of chemical composition and structure of an unknown substance without any information remains a challenging task despite advances in chemistry and mass spectrometry. Such challenges are occasionally encountered in forensic chemistry and doping analysis in human and equine sports, where discoveries of “designer” drugs of small molecules that were used to evade detection have been reported [13]. Analyses of unknown samples containing substances, which are not known to analytical chemists at the time of analyses but already known and reported in the literature, may be feasible if initial analyses can reveal possible candidates and relevant reference standards are available for comparison. However, analyses of samples with totally unknown substances not documented in the literature are far more difficult.

Recently, we undertook such a difficult analysis when the Pennsylvania (PA) Harness Racing Commission submitted for analysis/identification a 6-mL plastic syringe containing an unknown clear liquid (~200 μL) seized during a barn search at a racetrack in PA. The intelligence gathered on the unknown sample suggested “the syringe contained some form of pain medication or morphine.” No additional information was provided. Initial accurate-mass spectrometric analysis of the unknown liquid excluded the presence of any morphine. Lack of additional information compounded the difficulty in identifying the unknown sample. Nevertheless, our initial investigation by mass spectrometry suggested that the unknown liquid contained a cyclic peptide as the major component. Although sequence determination of linear peptides can be routinely achieved by mass spectrometry [46], sequencing of cyclic peptides is complicated. There have been reports on sequencing of cyclic peptides by fast atom bombardment (FAB) [712], Fourier transform ion cyclotron resonance (FTICR) [13], triple quadrupole [14], and ion trap [1518] mass spectrometry, and reviews [19, 20]. However, studies in most of the reports used known cyclic peptides to develop sequencing methodologies [810, 1215, 17], while only a few publications reported sequencing of unknown cyclic peptides [11, 2123]. Even in those few publications, quantities of the samples involved were sufficient for the chemical processing, Edman degradation analysis, and nuclear magnetic resonance (NMR) measurements.

In the present study, the volume of the sample was very limited (~200 μL) and concentration of the unknown cyclic peptide was low (estimated to be <100 μg/mL). These limitations excluded utilization of other techniques such as nuclear magnetic resonance (NMR) and acid hydrolysis followed by Edman degradation that were used as supplementary tools for sequencing of cyclic peptides in previous publications. Nonetheless, we have determined the sequence of the unknown cyclic peptide by collision-induced dissociation (CID) and electron transfer dissociation (ETD) tandem mass spectrometry. Additionally, multi-stage CID mass spectrometry was evaluated for sequencing cyclic peptides containing Arg and Lys residues.

2 Experimental

2.1 Chemicals

Three cyclic peptides, cyclo[Arg-Ala-Asp-D-Phe-Lys] (cyclo[RADfK]), cyclo[Arg-Gly-Asp-D-Phe-Lys] (cyclo[RGDfK]), and cyclo[Arg-Gly-Asp-D-Tyr-Lys] (cyclo[RGDyK]) were purchased from AnaSpec (Fremont, CA, USA). Each of them was dissolved in acetonitrile/water (50/50, vol/vol) resulting in stock solutions of 1.0 mg/mL. Working solution of each cyclic peptide at 10 μg/mL was prepared by diluting the stock solution with acetonitrile/water/formic acid (50/50/0.1, vol/vol/vol).

Acetonitrile and Water (both Optima grade) were obtained from Fisher Scientific (Pittsburg, PA, USA), formic acid (99%) was from EMD Chemicals via Scientific Equipment Co. (Aston, PA, USA), and trypsin (sequencing grade modified) from Promega (Madison, WI, USA).

2.2 Preparation of the Unknown Sample for LC-MS Analyses

The sample was diluted to 1/10 (10 μL/100 μL) or 1/25 (8 μL/200 μL) with H2O/acetonitrile/formic acid (50/50/0.1, vol/vol/vol) for syringe infusion or LC-MS analyses.

2.3 Tryptic Digestion

An aliquot (8 μL) of the undiluted unknown sample was transferred to a 1.5-mL microcentrifuge tube (Fisher Scientific) containing 164 μL of 50 mM ammonium bicarbonate (pH 7.8). To the mixture was 20 μL of trypsin in the bicarbonate buffer (20 μg/100 μL) added. The mixture was shaken briefly, and then incubated in a water bath at 37 °C for 3 h. The digestion reaction was quenched by adding 8 μL of 10% formic acid, and the digest was analyzed immediately or stored at –70 °C until analyzed.

2.4 LC-MS Analyses

All mass spectrometric, tandem mass spectrometric or LC-MS/MS analyses were conducted using either an LTQ XL linear ion trap with ETD (Thermo Fisher Scientific, San Jose, CA) or a high resolution and high mass accuracy hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) with an IonMax electrospray ionization (ESI) source interfaced to a Surveyor Plus liquid chromatograph with an online degasser and a Surveyor Plus auto-sampler. The linear ion trap was mass calibrated monthly, and the LTQ Orbitrap weekly using calibration standards as per instrument manuals. Mass accuracy of the Orbitrap was better than 5 ppm with external calibration, according to the manufacturer’s specification. The mass accuracy in both MS and MS/MS modes was checked daily with syringe infusion of individual clenbuterol and bradykinin fragment 2–9 solutions (each 1 μg/mL), and it was determined to be within 2 ppm. The Orbitrap was operated at a resolving power of 100,000 (FWHM, at m/z 400) unless specified otherwise.

LC separations, when necessary, were carried out using a wide-pore Zorbax 300SB C18 column (50 × 1.0 mm i.d., 3.5 μm) with a Zorbax StableBond guard column (17 × 1.0 mm i.d., 5 μm; Agilent, Wilmington, DE, USA) maintained at 26 °C. A mobile phase gradient was used for elution of the unknown peptide and its tryptic digest, and it consisted of Solvent A (H2O/acetonitrile/formic acid, 95/5/0.1, vol/vol/vol) and B (H2O/acetonitrile/formic acid, 5/95/0.1, vol/vol/vol). The gradient was programmed as follows: 0% B (0 –3 min) was increased to 2% B (3 –5 min), to 50% B (5.0–5.5 min) and to 80% (5.5–6.0 min), held at 80% B (6.0–10.0 min), decreased to 0% B (10.0–10.5 min), and held at 0% B (10.5–16.0 min). The mobile phase flow was also programmed: 50 μL/min (0–5.5 min) was increased to 100 μL/min (5.5–6.0 min), held at 100 μL/min (6.0–15.0 min), decreased to 50 μL/min (15.0–15.5 min), and held at 50 μL/min (15.5–16.0 min). Twenty microliters of the diluted unknown sample or its tryptic digest was injected for analysis.

Both the linear ion trap and LTQ Orbitrap were operated in the positive ion mode. The ion transfer capillary temperature of each instrument was set at 275 °C for syringe infusion experiments, and at 325 °C for LC-MS analyses. ESI source parameters for both instruments were optimized with infusion of bradykinin fragment 2–9 solution (1 μg/mL), for syringe infusion and LC flow rates at 5 and 50 μL/min, respectively. For CID experiments using the LTQ XL and LTQ Orbitrap, the isolation width was set at 1.5, the standard activation Q value (0.25) and activation time (30 ms) were used, and wideband activation enabled. Helium (dampening gas) was used as the collision gas for CID. For ETD experiments with the ion trap instrument, fluoranthene was employed to generate reagent anions. Reagent ion source settings were as follows: temperature, 160 °C; emission current, 50 μA; electron energy, –70 V; chemical ionization (CI) pressure, 20 psi; max inject time, 50 ms; automatic gain control (AGC) target, 3 × 105. The isolation width was set at 1.5, and the activation time at 100 ms. Data acquisition and analysis were accomplished with Xcalibur software (ver. 2.0.6, Thermo Electron Corp., San Jose, CA, USA).

3 Results

3.1 Analysis of the Unknown Sample by Full-Scan Mass Spectrometry

The unknown sample was analyzed using the LTQ Orbitrap, and the full-scan accurate-mass spectrum (Figure 1) suggests that it contains a small peptide as the major component, as evidenced by the predominant singly, doubly, and triply charged species at m/z 640.3922, 320.6998, and 214.1358, respectively. The charge state of the three ion species was determined by the isotopic peak distributions (not shown). The accurate-mass values of the three ion species indicate that they were formed from a single peptide (640.3922–1.0073 = 639.3849 from the singly charged species, 2 × 320.6998–2 × 1.0073 = 639.3850 from the doubly charged, 3 × 214.1358–3 × 1.0073 = 639.3855 from the triply charged). Observation of the charge state of up to three (triply charged) for such a small peptide is unusual and implies that it contains two basic amino acid residues such as Arg, Lys, or His.

Figure 1
figure 1

Full-scan accurate-mass spectrum of the unknown sample by the Orbitrap indicating that the major component is a peptide, of which the singly, doubly, and triply charged species are m/z 640.3922, 320.6998, and 214.1358, respectively. One-tenth dilution of the unknown sample was syringe-infused at 5 μL/min into the ESI source

Examination of a highly enlarged portion around m/z 286 of the full-scan mass spectrum (S-Figure 1 in the online supplementary material) excluded the presence of any morphine in the unknown sample — the two mass spectrometric peaks at m/z 286.1303 (relative intensity 0.0055% to the base peak at m/z 320.6998) and m/z 286.1815 (relative intensity 0.07% to the base peak) are too far from the exact mass m/z 286.1438 for protonated morphine (C17H19NO3+H+). Similarly, further examination of the accurate-mass spectrum of the unknown sample ruled out some 140 analgesic drugs in the Merck Index (ver. 13.4) from consideration. Furthermore, all peaks (except those of the unknown peptide) with relative intensity greater than 4% in the accurate-mass spectrum of the unknown sample were examined for known drugs in the Merck Index (a comprehensive drug handbook). The Merck Index (electronic version) was manually searched using nominal masses from those peaks (for example, 392 and 393 for the peak at m/z 393.2091), for candidate drugs. Theoretical mass values for formulae of the candidate drugs plus a proton were calculated and compared with the measured accurate mass values, and a drug is considered to be possibly present when the difference between the theoretical and measured mass value is less than 5 ppm (Table 1). Betamethasone, dexamethasone, and paramethasone are possible drugs for the ion of m/z 393.2091, ethyl dibunate is a possible drug for m/z 349.1829, actinobolin is for m/z 301.1408, and muscarine for m/z 174.1490. However, none of the possible drugs is really present in the unknown sample, as suggested by the results from MS/MS experiments on those ions (data not shown). Based on the above results, it was concluded that the unknown sample does not contain any morphine or known drug, but a small peptide as its major component. Subsequent investigations focused on identifying the unknown peptide.

Table 1 Accurate mass values for the mass spectral peaks except those of the unknown peptide recorded with the unknown sample (Figure 1) and their possible relevance to known drugs

3.2 Exploring Amino Acid Sequence of the Unknown Peptide with CID MS/MS

De novo sequencing by MS/MS was attempted to identify the unknown peptide. The accurate-mass CID product ion spectrum of the doubly charged unknown peptide (Figure 2 and Table 2) indicates the presence of Lys (not Gln), Phe, and Arg residues. Specifically, the base peak at m/z 289.6760 in the product ion spectrum (top panel in Figure 2) is a doubly charged species and relates to the loss of two ammonia (NH3) molecules and a carbon monoxide (CO) from the precursor ion. The doubly charged unknown peptide of m/z 320.6998 simultaneously loses an NH3 molecule and a protonated Lys residue [a singly charged species of m/z 129.1024 that was observed in the product ion spectrum, which was neither the immonium of Arg (m/z 129.1135) nor a neutral Glu residue (129.0420)], resulting in the product ion of m/z 495.2716. In the product ion spectrum (top panel in Figure 2), the mass difference between the peaks at m/z 495.2716 and 348.2030 is 147.0866, which relates to a Phe residue (147.0684). The mass difference between the peaks at m/z 331.1764 and 175.0755 corresponds to an Arg residue (156.1009 observed versus 156.1011 theoretical). An FK/KF b2 ion was observed at m/z 276.1706, as was an FRK (or any combination) b3 ion at m/z 432.2720. Mass difference between the doubly charged unknown peptide (2 × 320.6998) and the singly charged FRK b3 product ion (432.2720) was 209.1276, a singly charged species for the unidentified portion of the unknown peptide. For the accurate-mass value of 209.1276, theoretically, there is only one possible elemental composition, C11H17O2N2 (Δ mass  =  –  4.1 ppm, RDB  =  4.5), if the nitrogen rule is applied to the even-electron ion and mass error is limited to within 5 ppm in calculating elemental composition. This singly charged species had to be determined in order to identify the unknown peptide.

Figure 2
figure 2

CID product ion spectra of the doubly charged unknown peptide of m/z 320.9 (top panel) and its tryptic digest of m/z 321.3 (bottom panel) indicating the presence of Lys, Phe, and Arg residues. Unannotated are the peaks at m/z 478.2452, 467.2763, 450.2494, 331.1764, 321.2019, 276.1706, 175.0755, and 147.0806 in the top panel, and 478.2446, 467.2759, 450.2816, 432.2714, 331.1762, 276.1702, 175.0753, and 147.0804 in the bottom panel. The unknown peptide in 1/25 dilution and its tryptic digest were analyzed by LC-MS/MS (see S-Figure 3 for LC-MS/MS chromatograms). The collision energy was 30% for CID of both peptides

Table 2 Interpretation of CID product ion peaks of the doubly charged unknown peptide and its digest from accurate-mass measurements

In the product ion spectrum of the doubly charged unknown peptide (top panel in Figure 2), the peaks at m/z 175.0755 and 147.0806 are not Arg y1 and Lys y1 product ions (theoretical m/z 175.1190 and 147.1128, respectively) but relate to the singly charged species of m/z 209.1276 mentioned above. It loses two NH3 molecules to generate the ion of m/z 175.0755 (209.1276–175.0755 = 34.0521 versus theoretical mass of 34.0520 for two NH3 molecules) and further loses a CO molecule resulting in the ion of m/z 147.0806 (175.0755–147.0806 = 27.9949 versus 27.9949 for a CO molecule). Similarly, the peak at m/z 164.1072 in the product ion spectrum also relates to the singly charged species of m/z 209.1276 — the latter loses an NH3 molecule and a CO molecule resulting in the ion of m/z 164.1072 (209.1276–164.1072 = 45.0204 versus 45.0214 for an NH3 molecule and a CO).

All major peaks were accounted for, except one at m/z 321.2019 in the product ion spectrum of the doubly charged unknown peptide. Presence of only b-type product ions and absence of y-type product ions in the product ion spectrum suggest a cyclic sequence for the unknown peptide, because a cyclic peptide, in theory, does not generate any y ions due to the lack of a C-terminal hydroxyl group [24, 25].

3.3 Characterizing the Unknown Peptide by Tryptic Digestion

The unknown peptide contains Lys and Arg residues and may be a cyclic one; tryptic digestion is a useful and convenient tool for verifying the cyclic sequence, based on the known resistance of cyclic peptides to tryptic digestion [2628]. The unknown peptide was subjected to tryptic digestion and subsequent LC-MS analysis. The result showed that no tryptic fragments were detected from the digestion, which was consistent with a cyclic nature of the unknown peptide. Unexpectedly, deamidation to the unknown peptide was observed from the digestion. In the full scan accurate-mass spectrum of the “tryptic digest” of the unknown peptide (S-Figure 2), peaks of the singly and doubly charged species at m/z 641.3772 and 321.1922 were observed, and they are 0.9850 and 0.4924 mass unit higher than the relevant singly and doubly charged unknown peptide. The observed mass increase of 0.9850 Da resulting from trypsin digestion of the unknown peptide indicates that it was deamidated (–NH3 + H2O = –17.0265 + 18.0106 = 0.9841). Furthermore, comparison between product ion spectra of the doubly charged unknown peptide and its “tryptic digest” (Figure 2 and Table 2) also indicates occurrence of the deamidation. Specifically, the doubly charged unknown peptide loses upon CID both an NH3 molecule and a protonated Lys residue leading to the peak at m/z 495.2716 (top panel in Figure 2). In contrast, the doubly charged “tryptic digest” of the unknown peptide loses a H2O molecule (513.2822–495.2712 = 18.011) after losing a protonated Lys residue (2 × 321.1922–513.2822 = 129.1022, bottom panel in Figure 2). The above interpretation of the product ion spectra confirmed deamidation of the unknown peptide resulting from the tryptic digestion. Such deamidation is unusual and unexpected because trypsin is well known for its unique specificity of cleaving at C-terminus of only Arg or Lys residue in a peptide or protein but not known for deamidation activity. However, rare cases of deamidation of Gln/Asn in proteins by trypsin have been reported, where the tertiary structure is the principal determinant to the deamidation [29]. In the present study, the deamidation of the unknown peptide suggests that it might contain either a Gln or an Asn residue.

The observed direct loss of an amino acid residue (rather than an intact amino acid or amino amide) such as a protonated Lys residue from the tryptic digest of the unknown peptide in CID described above is an indication of a cyclic peptide. All results to this point suggest that the unknown peptide is a cyclic one.

3.4 Attempts at Sequencing the Unknown Cyclic Peptide by Multi-Stage Mass Spectrometry

Multi-stage tandem mass spectrometry (MSn) is the technique of choice for sequencing cyclic peptides, which was first reported by Gross’ group to demonstrate sequencing of seven known cyclic peptides [15]. A recent publication also reported on sequencing of three known cyclic peptides [16]. We attempted to sequence the unknown cyclic peptide using this technique on the LTQ XL ion trap. The first-generation product ion of m/z 495 from MS/MS of the doubly protonated unknown cyclic peptide (Table 2) was isolated in the ion trap and subjected to CID, and a second-generation product ion of m/z 348 was generated (S-Figure 4), indicating loss of a Phe residue. This product ion of m/z 348 was subjected to further CID, but the resulting product ion spectrum did not reveal loss of an amino acid residue (S-Figure 4). Another first-generation product ion of m/z 478 from MS/MS of the doubly protonated unknown cyclic peptide fragmented upon CID to a second-generation product ion of m/z 331, and the latter further to m/z 175 (S-Figure 5), indicating sequential loss of a Phe residue and an Arg. According to Gross et al [15], a protonated cyclic peptide opens the ring upon CID to generate isomeric protonated linear peptides and the latter lose an amino acid residue from the C-terminus. The results mentioned above suggest the connectivity of Lys, Phe, and Arg residues to be RFK. CID of the first-generation product ion of m/z 432 from the doubly protonated unknown peptide resulted in two series of second-generation product ions: one indicating sequential loss of a Lys residue and a Phe, and the other that of an Arg residue and a Phe (S-Figure 6). In short, the MSn results suggest only partial and contradicting sequence for the unknown peptide.

3.5 Elucidation of Amino Acid Sequence of the Unknown Peptide by ETD

ETD is a recently developed technique for sequencing of peptides and proteins [30]. Although having not been used in sequencing of cyclic peptides, ETD may provide valuable sequencing information supplementary to that from CID. Thus, it was employed in the effort at elucidating the sequence of the unknown peptide. The triply charged unknown peptide of m/z 214.3 was subjected to ETD because a higher charge state of a peptide is suitable for ETD. The product ion spectra of both the unknown peptide and its tryptic digest (Figure 3) indicate the presence of Gln, Phe, and Arg residues, and of Glu, Phe, and Arg residues, respectively. The connectivity of Gln, Phe, and Arg residues is considered to be Arg-Phe-Gln or its retro sequence Gln-Phe-Arg (Figure 3). Other ion species such as the singly and doubly protonated unknown peptide, demethylation species, and deoxygenation forms in the ETD product ion spectra (Figure 3) could be interpreted by the free radical reaction cascade model [31].

Figure 3
figure 3

ETD product ion spectra of the triply charged unknown peptide of m/z 214.3 (top panel) and its tryptic digest of m/z 214.4 (bottom panel) indicating the presence of Q, F, and R residues in the unknown peptide, E, F, and R residues in the digest of the unknown peptide. The peaks at m/z 640.4 and 320.8 (top panel) relate to the singly and doubly protonated unknown peptide; the peaks at m/z 641.4 and 321.3 (bottom panel) are similar ion species from tryptic digest of the unknown peptide. The peak at m/z 214.1 is from LC solvent/plastic wares. The unknown sample in 1/25 dilution and its tryptic digest were analyzed by LC-MS/MS (see S-Figure 7 for LC-MS/MS chromatograms)

3.6 Amino Acid Sequence Proposed for the Unknown Peptide

The CID results described above indicate a possible partial sequence, RFK, for the unknown peptide. The remaining unidentified portion of the peptide was determined by accurate-mass measurements to have the elemental composition C11H16N2O2 (C11H17 N2O2 for the singly charged form), which is obviously not a standard amino acid but, most likely, a modified one. The ETD results suggest that the unknown peptide contains a subsequence, RFQ or QFR. After all the results from CID, ETD, and tryptic digestion are pieced together, cyclo [Arg-Lys-N(C6H9)Gln-Phe] (Scheme 1) is proposed as the sequence for the unknown peptide. In the proposed sequence, the unusual N(C6H9)-Gln residue has a C6H9 group attached to the amino nitrogen atom. The chemical structure of the C6H9 group could not be readily determined, but the most possible structure might be 4-cyclohexenyl (or 3-cyclohexenyl). It should be emphasized that the C6H9 substituent plays an essential role in deduction of the unknown cyclic peptide sequence, and that absence of any substituent on the amide nitrogen atom of the Lys residue would make it impossible to differentiate between the correct and retro sequences.

Scheme 1
scheme 1

Primary structure proposed for the unknown peptide, and CID fragmentation pathways proposed for loss of a protonated lysine residue (m/z 129) from the doubly protonated cyclic tetrapeptide and for generation of the product ion at m/z 432. Arg, Gln, Lys, and Phe in this scheme represent the side chains of the relevant amino acids

3.7 Reconciling the Proposed Sequence with the Experimental Results

The sequence proposed above for the unknown peptide can be reconciled with all the experimental results. First, the exact mass calculated for the proposed sequence in a singly protonated state (559.3225 for RKFQ + 81.0704 for C6H9 = 640.3929 that already includes the exact mass of a proton) is in agreement with that observed for the singly charged unknown peptide (640.3922), with an error of –1.1 ppm. Second, the proposed cyclic peptide can be singly, doubly, or triply protonated because Arg, Lys, and the amide nitrogen of the unusual Gln (this nitrogen atom is more basic than the other amide nitrogen atoms because it has an alkyl substituent) can be individually or collectively protonated. This prediction is in line with the observed charge state envelope of the unknown peptide. Third, with the proposed sequence, some of the observed major product ions of the unknown peptide from CID (Figure 2) can be interpreted. Specifically, the doubly protonated cyclic peptide may lose upon CID a protonated- Lys residue or the unusual residue of N-(C6H9)-Gln (Scheme 1), resulting in the product ion of m/z 512 (Table 2 and S-Figure 8) or 432. The product ion of m/z 512 can further lose an NH3 to yield the product ion of m/z 495. Finally, and most importantly, the proposed sequence can help explain such puzzling experimental results that Lys, Phe, and Arg residues were observed by CID, whereas Gln, Phe, and Arg were detected by ETD in the unknown peptide (Scheme 2). In CID, any of the amide bonds in the proposed cyclic peptide can be cleaved; loss of Lys, Phe, and Arg residues may be experimentally observed but the unusual Gln residue cannot be detected in the form of an ordinary Gln (Scheme 2). In ETD, however, cleavages can take place at any of the bonds between an amide nitrogen and an α carbon; loss of Gln, Phe, and Arg residues may be observed but no ordinary Lys residue can be detected because the Lys would appear as an unusual residue in the form of Lys-N(C6H9) (Scheme 2).

Scheme 2
scheme 2

Possible CID and ETD cleavage sites for the cyclic tetrapeptide explaining non-detectability of a normal Gln residue by CID and that of a normal Lys residue by ETD

4 Discussion

4.1 Preferential Loss of Lysine Residue in CID

In the CID product ion spectrum of the tryptically digested cyclic peptide (bottom panel in Figure 2), the dominant peak at m/z 513.2822 indicates preferential loss of a protonated Lys residue from the doubly protonated peptide. This loss is even more feasible than that of a H2O molecule because the spectrum indicates that the doubly protonated peptide first lose a Lys residue (2 × m/z 321.1922 → m/z 513.2822, Table 2) and then a H2O molecule (m/z 513.2822 → 495.2712). Similar preferential loss of a Lys residue from the cyclic peptide itself is also indicated: the minute peak at m/z 512.2978 in its product ion spectrum (S-Figure 8) suggests loss of a protonated Lys residue. It has been reported that deamination and dehydration are the prominent fragmentation losses that an N-terminal glutamine of protonated linear peptides underwent in CID [32]. However, preferential loss of a protonated Lys residue over that of a H2O molecule was observed for the tryptically digested cyclic peptide in the present study. This preferential loss of a protonated Lys could be accounted for by the mechanism of amino acid side chain-assisted amide bond cleavage [33] (Scheme 1). Such preferential loss of a protonated Lys residue also was observed for cyclic peptides, cyclo[RADfK], cyclo[RGDfK], and cyclo[RGDyK] (S-Figure 9 and S-Table 1), which were chosen in this study because of their commercial availability. Similar loss of a Lys or Gln residue from the cyclic b5 product ion of a known linear peptide YAKFLG or YAQFLG was reported by other investigators [34].

4.2 MSn Sequencing of Cyclic Peptides Containing Arg and Lys Residues

MSn is the technique of choice for sequencing cyclic peptides [15], and was evaluated for sequencing known cyclic peptides containing Arg and Lys residues, cyclo[RADfK], cyclo[RGDfK], and cyclo[RGDyK], to determine its applicability to sequencing of the unknown peptide. Doubly protonated cyclo[RADfK] of m/z 309.8 fragments upon CID to protonated RADf of m/z 490.2 and KRAD of m/z 471.2, doubly protonated cyclo[RGDfK] of m/z 302.9 to protonated RGDf of m/z 476.2 and KRGD of m/z 457.2, and doubly protonated cyclo[RGDyK] of m/z 310.8 to protonated RGDy of m/z 492.2 and KRGD of m/z 457.2 (S-Figure 9 and S-Table 1). MS/MS spectra of the first-generation product ions, RADf of m/z 490.2, RGDf of m/z 476.2, and RGDy of m/z 492.2, clearly indicate the loss of a Phe, Phe, and Tyr residue, respectively (S-Figure 10). However, MS/MS spectra of the second-generation product ions, protonated RAD of m/z 343.2, RGD of m/z 329.2, and RGD of m/z 329.2, do not obviously reveal the loss of an Asp residue (S-Figure 11). The same is true for the MS/MS spectra of the first-generation product ions, protonated KRAD of m/z 471.2, KRGD of m/z 457.2, and KRGD of m/z 457.2 (S-Figure 12). These results indicate that MSn can provide only partial sequence information for cyclic peptides containing adjacent Arg and Lys residues. Thus, MSn may not be able to provide complete sequence information for the unknown peptide, which is consistent with the experimental results.

4.3 Sequencing of Cyclic Peptides by ETD

Peptides have conventionally been sequenced by combination of chemical or enzyme degradation, mass spectrometry, and NMR, but certain quantity of a peptide is required. As modern mass spectrometry advances, linear peptides in small quantity can be easily sequenced. However, sequencing of cyclic peptides is complicated. To our knowledge, there has been no report on the use of ETD for sequencing of a cyclic peptide, although ETD has been utilized in sequencing of linear peptides but cyclic peptides were used in studying the fragmentation mechanism of electron capture dissociation (ECD) [31, 35]. The result of the present study demonstrates that ETD is useful for sequencing a cyclic peptide.

The major product ions in the ETD product ion spectrum of the cyclic tetrapeptide (top panel in Figure 3), m/z 512.3, 433.2, 365.3, 277.1, and 209.1, can be accounted for by fragmentation pathways proposed (Scheme 3), which are based on the ECD mechanisms [36] and the ECD free radical reaction cascade model [31] because ETD resembles ECD in fragmentation mechanism [30]. The triply protonated cyclic tetrapeptide can accept one or two electrons in ETD, resulting in doubly or singly charged species of the triply protonated cyclic peptide. As depicted in Scheme 3, the doubly charged species generates intermediate products with the free radical on the α carbon of either the Arg or Phe residue, and they lead to the QF pseudo-z2 (z p2 ) product ion of m/z 277.1 (pseudo z- and pseudo b-type product ions will be defined later) and the pseudo-b2 (b p2 ) ion of m/z 365.3, the b p3 product ion of m/z 512.3, and the [Q z p1 –OH] ion of m/z 113.2 (top panel of Figure 3), respectively. The proposed pathway for generation of the [Q z p1 –OH] ion (Scheme 3) can explain the lack of a relevant [E z p1 –OH] ion in the ETD product ion spectrum of the tryptically digested cyclic tetrapeptide (bottom panel in Figure 3); the side-chain carboxylic carbonyl group of the Glu residue in the digested cyclic tetrapeptide is not sufficiently basic to retain the positive charge, compared with the side-chain amide carbonyl group of the Gln residue in the cyclic tetrapeptide. In addition, the singly charged species of the triply protonated cyclic tetrapeptide generates the QFR z p3 ion (m/z 433.2) or the b p1 product ion (m/z 209.1), depending on the location of the charge (Scheme 3). The proposed fragmentation pathways can interpret all the major product ions observed of the cyclic tetrapeptide.

Scheme 3
scheme 3scheme 3

ETD fragmentation pathways proposed for generation of product ions from the cyclic tetrapeptide

The major product ions in the ETD product ion spectrum of the cyclic tetrapeptide (top panel in Figure 3) can be categorized into two series: m/z 277.1 and 433.2; m/z 209.1, 365.3, and 512.3. The first series of product ions (m/z 277.1 and 433.2) are named pseudo-z ions in the present study because they are similar in chemical structure to relevant z-type product ions from a linear peptide but one mass unit less than the z-type product ions due to replacement of the carboxylic group with the hydroxyliminomethyl group (QF z p2 ion in Scheme 3, for example). The m/z value of a pseudo-z ion equals that of the relevant b ion plus 1 (b ion – NH + NH + H, see the structure of QF z p2 ion in Scheme 3) if the charge state of the former is 1+. This knowledge would assist with recognition of pseudo-z ions. The other series of product ions (m/z 209.1, 365.3, and 512.3) is named pseudo-b product ions because they are the same in mass as relevant b-type product ions from a linear peptide but different in chemical structure from the b-type ions (b p1 ion in Scheme 3, for example). Similar to c and z product ions [37], bp and zp product ions are even- and odd-electron ions, respectively. The bp- and zp-type ions are useful for sequencing a cyclic peptide, as are c- and z-type product ions for sequencing a linear peptide. For instance, the peaks at m/z 209.1, 365.3, and 512.3 in the top panel of Figure 3 correspond to b p1 , b p2 , and b p3 product ions, respectively. Based on these bp ions, a partial sequence proposed would be N(C6H9)Lys-Arg-Phe, which is the retro-sequence of the correct partial sequence for the peptide. The peaks at m/z 113.1, 277.1, and 433.2 in the top panel of Figure 3 relate to z p1 , z p2 , and z p3 product ions, respectively. With these zp ions, a partial sequence proposed would be Arg-Phe-Gln, which is also the retro-sequence of the correct partial sequence. In short, the sequence proposed from bp and zp ion peaks in an ETD product ion spectrum of an unknown cyclic peptide is its retro sequence. By reversing the retro sequence, the correct sequence can be obtained. It should be noted that the above proposal for sequencing of an unknown cyclic peptide is infant because it is based on the results from one unknown peptide. Studies on more cyclic peptides are required to validate the proposal.

4.4 Possible Pharmacological Effects of the Identified Cyclic Peptide

Now that the major component of the unknown sample was identified as a cyclic tetrapeptide, the next relevant question would be what the pharmacologic effects of this cyclic peptide are. To address this question, extensive database searches were conducted: searches with SciFinder Scholar against the Chemical Abstracts and Medline using queries of molecular formula (C32H49N9O5), Substructure Search and Similarity Search with the cyclo[Arg-Lys-Gln-Phe] structure drawn; searches against the National Center for Biotechnology Informatics’ (NCBI) Protein Database with a query of Molecular Weight (639) filtered by Sequence Length (4). No exact match to the sequence proposed above or to cyclo[Arg-Lys-Gln-Phe] was found from the searches. Consequently, the pharmacologic effects of the cyclic tetrapeptide remain unknown except for the uncorroborated intelligence gathered at the racetrack on the unknown sample as “something for pain control.” However, it is known that cyclic peptides are a unique class of substances with diverse biological/pharmacologic actions resulting from confined and rigid three-dimensional structures, which range from antibiotics [11, 38], immunosuppression [39, 40], analgesics [4144], somatostatin mimics [4547], to facilitating leanness [48]. The last three types of actions could enhance athletic performance in equine athletes. The cyclic peptide identified in this study might have performance-enhancing effects, which could account for why it was found in a barn at a racetrack and might have been abused in racehorses, drivers and/or jockeys. The identified cyclic tetrapeptide might be a “designer” drug, or of natural origin.

5 Conclusion

The unknown clear liquid in the seized syringe has been determined to contain a major component identified as a cyclic tetrapeptide, cyclo[Arg-Lys-N(C6H9)Gln-Phe], in which the N is an amide nitrogen atom to Lys and in α amino group to Gln. The chemical structure of the C6H9 substituent on the N atom has not been determined yet, but the most likely structure is proposed as a 4- or 3-cyclohexenyl group. The identified cyclic peptide has not been documented in the literature, its true pharmacological effects are unknown, but it might be a “designer” drug with athletic performance-enhancing effects.

To the authors’ knowledge, it is the first time that ETD was used for sequencing cyclic peptides. ETD of the triply protonated cyclic tetrapeptide resulted in two series of product ions that are named bp- and zp-type ions, respectively, and their origins are interpreted by the proposed fragmentation pathways. The bp and zp product ions will be helpful for de novo sequencing of cyclic peptides.

Preferential loss of a protonated Lys residue from the tryptically digested cyclic tetrapeptide was observed by CID, and accounted for by amino acid side chain-assisted fragmentation. Additionally, CID MSn cannot provide complete sequence information for cyclic peptides containing Arg and Lys residues.