Sequence Elucidation of an Unknown Cyclic Peptide of High Doping Potential by ETD and CID Tandem Mass Spectrometry
- First Online:
- Cite this article as:
- Guan, F., Uboh, C.E., Soma, L.R. et al. J. Am. Soc. Mass Spectrom. (2011) 22: 718. doi:10.1007/s13361-011-0080-5
- 417 Views
Identification of an unknown substance without any information remains a daunting challenge despite advances in chemistry and mass spectrometry. However, an unknown cyclic peptide in a sample with very limited volume seized at a Pennsylvania racetrack has been successfully identified. The unknown sample was determined by accurate mass measurements to contain a small unknown peptide as the major component. Collision-induced dissociation (CID) of the unknown peptide revealed the presence of Lys (not Gln, by accurate mass), Phe, and Arg residues, and absence of any y-type product ion. The latter, together with the tryptic digestion results of the unusual deamidation and absence of any tryptic cleavage, suggests a cyclic structure for the peptide. Electron-transfer dissociation (ETD) of the unknown peptide indicated the presence of Gln (not Lys, by the unusual deamidation), Phe, and Arg residues and their connectivity. After all the results were pieced together, a cyclic tetrapeptide, cyclo[Arg-Lys-N(C6H9)Gln-Phe], is proposed for the unknown peptide. Observations of different amino acid residues from CID and ETD experiments for the peptide were interpreted by a fragmentation pathway proposed, as was preferential CID loss of a Lys residue from the peptide. ETD was used for the first time in sequencing of a cyclic peptide; product ions resulting from ETD of the peptide identified were categorized into two types and named pseudo-b and pseudo-z ions that are important for sequencing of cyclic peptides. The ETD product ions were interpreted by fragmentation pathways proposed. Additionally, multi-stage CID mass spectrometry cannot provide complete sequence information for cyclic peptides containing adjacent Arg and Lys residues. The identified cyclic peptide has not been documented in the literature, its pharmacological effects are unknown, but it might be a “designer” drug with athletic performance-enhancing effects.
Key wordsUnknown cyclic peptideUnknown sampleDoping analysisEquine drug testingElectron transfer dissociationCollision-induced dissociatinAccurate massMultiple stage CID
Identification of chemical composition and structure of an unknown substance without any information remains a challenging task despite advances in chemistry and mass spectrometry. Such challenges are occasionally encountered in forensic chemistry and doping analysis in human and equine sports, where discoveries of “designer” drugs of small molecules that were used to evade detection have been reported [1–3]. Analyses of unknown samples containing substances, which are not known to analytical chemists at the time of analyses but already known and reported in the literature, may be feasible if initial analyses can reveal possible candidates and relevant reference standards are available for comparison. However, analyses of samples with totally unknown substances not documented in the literature are far more difficult.
Recently, we undertook such a difficult analysis when the Pennsylvania (PA) Harness Racing Commission submitted for analysis/identification a 6-mL plastic syringe containing an unknown clear liquid (~200 μL) seized during a barn search at a racetrack in PA. The intelligence gathered on the unknown sample suggested “the syringe contained some form of pain medication or morphine.” No additional information was provided. Initial accurate-mass spectrometric analysis of the unknown liquid excluded the presence of any morphine. Lack of additional information compounded the difficulty in identifying the unknown sample. Nevertheless, our initial investigation by mass spectrometry suggested that the unknown liquid contained a cyclic peptide as the major component. Although sequence determination of linear peptides can be routinely achieved by mass spectrometry [4–6], sequencing of cyclic peptides is complicated. There have been reports on sequencing of cyclic peptides by fast atom bombardment (FAB) [7–12], Fourier transform ion cyclotron resonance (FTICR) , triple quadrupole , and ion trap [15–18] mass spectrometry, and reviews [19, 20]. However, studies in most of the reports used known cyclic peptides to develop sequencing methodologies [8–10, 12–15, 17], while only a few publications reported sequencing of unknown cyclic peptides [11, 21–23]. Even in those few publications, quantities of the samples involved were sufficient for the chemical processing, Edman degradation analysis, and nuclear magnetic resonance (NMR) measurements.
In the present study, the volume of the sample was very limited (~200 μL) and concentration of the unknown cyclic peptide was low (estimated to be <100 μg/mL). These limitations excluded utilization of other techniques such as nuclear magnetic resonance (NMR) and acid hydrolysis followed by Edman degradation that were used as supplementary tools for sequencing of cyclic peptides in previous publications. Nonetheless, we have determined the sequence of the unknown cyclic peptide by collision-induced dissociation (CID) and electron transfer dissociation (ETD) tandem mass spectrometry. Additionally, multi-stage CID mass spectrometry was evaluated for sequencing cyclic peptides containing Arg and Lys residues.
Three cyclic peptides, cyclo[Arg-Ala-Asp-D-Phe-Lys] (cyclo[RADfK]), cyclo[Arg-Gly-Asp-D-Phe-Lys] (cyclo[RGDfK]), and cyclo[Arg-Gly-Asp-D-Tyr-Lys] (cyclo[RGDyK]) were purchased from AnaSpec (Fremont, CA, USA). Each of them was dissolved in acetonitrile/water (50/50, vol/vol) resulting in stock solutions of 1.0 mg/mL. Working solution of each cyclic peptide at 10 μg/mL was prepared by diluting the stock solution with acetonitrile/water/formic acid (50/50/0.1, vol/vol/vol).
Acetonitrile and Water (both Optima grade) were obtained from Fisher Scientific (Pittsburg, PA, USA), formic acid (99%) was from EMD Chemicals via Scientific Equipment Co. (Aston, PA, USA), and trypsin (sequencing grade modified) from Promega (Madison, WI, USA).
2.2 Preparation of the Unknown Sample for LC-MS Analyses
The sample was diluted to 1/10 (10 μL/100 μL) or 1/25 (8 μL/200 μL) with H2O/acetonitrile/formic acid (50/50/0.1, vol/vol/vol) for syringe infusion or LC-MS analyses.
2.3 Tryptic Digestion
An aliquot (8 μL) of the undiluted unknown sample was transferred to a 1.5-mL microcentrifuge tube (Fisher Scientific) containing 164 μL of 50 mM ammonium bicarbonate (pH 7.8). To the mixture was 20 μL of trypsin in the bicarbonate buffer (20 μg/100 μL) added. The mixture was shaken briefly, and then incubated in a water bath at 37 °C for 3 h. The digestion reaction was quenched by adding 8 μL of 10% formic acid, and the digest was analyzed immediately or stored at –70 °C until analyzed.
2.4 LC-MS Analyses
All mass spectrometric, tandem mass spectrometric or LC-MS/MS analyses were conducted using either an LTQ XL linear ion trap with ETD (Thermo Fisher Scientific, San Jose, CA) or a high resolution and high mass accuracy hybrid LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) with an IonMax electrospray ionization (ESI) source interfaced to a Surveyor Plus liquid chromatograph with an online degasser and a Surveyor Plus auto-sampler. The linear ion trap was mass calibrated monthly, and the LTQ Orbitrap weekly using calibration standards as per instrument manuals. Mass accuracy of the Orbitrap was better than 5 ppm with external calibration, according to the manufacturer’s specification. The mass accuracy in both MS and MS/MS modes was checked daily with syringe infusion of individual clenbuterol and bradykinin fragment 2–9 solutions (each 1 μg/mL), and it was determined to be within 2 ppm. The Orbitrap was operated at a resolving power of 100,000 (FWHM, at m/z 400) unless specified otherwise.
LC separations, when necessary, were carried out using a wide-pore Zorbax 300SB C18 column (50 × 1.0 mm i.d., 3.5 μm) with a Zorbax StableBond guard column (17 × 1.0 mm i.d., 5 μm; Agilent, Wilmington, DE, USA) maintained at 26 °C. A mobile phase gradient was used for elution of the unknown peptide and its tryptic digest, and it consisted of Solvent A (H2O/acetonitrile/formic acid, 95/5/0.1, vol/vol/vol) and B (H2O/acetonitrile/formic acid, 5/95/0.1, vol/vol/vol). The gradient was programmed as follows: 0% B (0 –3 min) was increased to 2% B (3 –5 min), to 50% B (5.0–5.5 min) and to 80% (5.5–6.0 min), held at 80% B (6.0–10.0 min), decreased to 0% B (10.0–10.5 min), and held at 0% B (10.5–16.0 min). The mobile phase flow was also programmed: 50 μL/min (0–5.5 min) was increased to 100 μL/min (5.5–6.0 min), held at 100 μL/min (6.0–15.0 min), decreased to 50 μL/min (15.0–15.5 min), and held at 50 μL/min (15.5–16.0 min). Twenty microliters of the diluted unknown sample or its tryptic digest was injected for analysis.
Both the linear ion trap and LTQ Orbitrap were operated in the positive ion mode. The ion transfer capillary temperature of each instrument was set at 275 °C for syringe infusion experiments, and at 325 °C for LC-MS analyses. ESI source parameters for both instruments were optimized with infusion of bradykinin fragment 2–9 solution (1 μg/mL), for syringe infusion and LC flow rates at 5 and 50 μL/min, respectively. For CID experiments using the LTQ XL and LTQ Orbitrap, the isolation width was set at 1.5, the standard activation Q value (0.25) and activation time (30 ms) were used, and wideband activation enabled. Helium (dampening gas) was used as the collision gas for CID. For ETD experiments with the ion trap instrument, fluoranthene was employed to generate reagent anions. Reagent ion source settings were as follows: temperature, 160 °C; emission current, 50 μA; electron energy, –70 V; chemical ionization (CI) pressure, 20 psi; max inject time, 50 ms; automatic gain control (AGC) target, 3 × 105. The isolation width was set at 1.5, and the activation time at 100 ms. Data acquisition and analysis were accomplished with Xcalibur software (ver. 2.0.6, Thermo Electron Corp., San Jose, CA, USA).
3.1 Analysis of the Unknown Sample by Full-Scan Mass Spectrometry
Accurate mass values for the mass spectral peaks except those of the unknown peptide recorded with the unknown sample (Figure 1) and their possible relevance to known drugs
Relative intensity (%)
Formula of drug(s)
Delta mass (ppm)
C22H29FO5 (βmethasone, dexamethasone, paramethasone)
C20H28O3S (ethyl dibunate)
3.2 Exploring Amino Acid Sequence of the Unknown Peptide with CID MS/MS
Interpretation of CID product ion peaks of the doubly charged unknown peptide and its digest from accurate-mass measurements
Product ion peak
Two relevant peaks
Residue or group
2 × 320.6998–512.2978
FRK (or any combination)
2NH3 + CO
2 × 321.1922–513.2822
FRK (or any combination)
478.2446 - 331.1762
H2O + NH3 + CO
In the product ion spectrum of the doubly charged unknown peptide (top panel in Figure 2), the peaks at m/z 175.0755 and 147.0806 are not Arg y1 and Lys y1 product ions (theoretical m/z 175.1190 and 147.1128, respectively) but relate to the singly charged species of m/z 209.1276 mentioned above. It loses two NH3 molecules to generate the ion of m/z 175.0755 (209.1276–175.0755 = 34.0521 versus theoretical mass of 34.0520 for two NH3 molecules) and further loses a CO molecule resulting in the ion of m/z 147.0806 (175.0755–147.0806 = 27.9949 versus 27.9949 for a CO molecule). Similarly, the peak at m/z 164.1072 in the product ion spectrum also relates to the singly charged species of m/z 209.1276 — the latter loses an NH3 molecule and a CO molecule resulting in the ion of m/z 164.1072 (209.1276–164.1072 = 45.0204 versus 45.0214 for an NH3 molecule and a CO).
All major peaks were accounted for, except one at m/z 321.2019 in the product ion spectrum of the doubly charged unknown peptide. Presence of only b-type product ions and absence of y-type product ions in the product ion spectrum suggest a cyclic sequence for the unknown peptide, because a cyclic peptide, in theory, does not generate any y ions due to the lack of a C-terminal hydroxyl group [24, 25].
3.3 Characterizing the Unknown Peptide by Tryptic Digestion
The unknown peptide contains Lys and Arg residues and may be a cyclic one; tryptic digestion is a useful and convenient tool for verifying the cyclic sequence, based on the known resistance of cyclic peptides to tryptic digestion [26–28]. The unknown peptide was subjected to tryptic digestion and subsequent LC-MS analysis. The result showed that no tryptic fragments were detected from the digestion, which was consistent with a cyclic nature of the unknown peptide. Unexpectedly, deamidation to the unknown peptide was observed from the digestion. In the full scan accurate-mass spectrum of the “tryptic digest” of the unknown peptide (S-Figure 2), peaks of the singly and doubly charged species at m/z 641.3772 and 321.1922 were observed, and they are 0.9850 and 0.4924 mass unit higher than the relevant singly and doubly charged unknown peptide. The observed mass increase of 0.9850 Da resulting from trypsin digestion of the unknown peptide indicates that it was deamidated (–NH3 + H2O = –17.0265 + 18.0106 = 0.9841). Furthermore, comparison between product ion spectra of the doubly charged unknown peptide and its “tryptic digest” (Figure 2 and Table 2) also indicates occurrence of the deamidation. Specifically, the doubly charged unknown peptide loses upon CID both an NH3 molecule and a protonated Lys residue leading to the peak at m/z 495.2716 (top panel in Figure 2). In contrast, the doubly charged “tryptic digest” of the unknown peptide loses a H2O molecule (513.2822–495.2712 = 18.011) after losing a protonated Lys residue (2 × 321.1922–513.2822 = 129.1022, bottom panel in Figure 2). The above interpretation of the product ion spectra confirmed deamidation of the unknown peptide resulting from the tryptic digestion. Such deamidation is unusual and unexpected because trypsin is well known for its unique specificity of cleaving at C-terminus of only Arg or Lys residue in a peptide or protein but not known for deamidation activity. However, rare cases of deamidation of Gln/Asn in proteins by trypsin have been reported, where the tertiary structure is the principal determinant to the deamidation . In the present study, the deamidation of the unknown peptide suggests that it might contain either a Gln or an Asn residue.
The observed direct loss of an amino acid residue (rather than an intact amino acid or amino amide) such as a protonated Lys residue from the tryptic digest of the unknown peptide in CID described above is an indication of a cyclic peptide. All results to this point suggest that the unknown peptide is a cyclic one.
3.4 Attempts at Sequencing the Unknown Cyclic Peptide by Multi-Stage Mass Spectrometry
Multi-stage tandem mass spectrometry (MSn) is the technique of choice for sequencing cyclic peptides, which was first reported by Gross’ group to demonstrate sequencing of seven known cyclic peptides . A recent publication also reported on sequencing of three known cyclic peptides . We attempted to sequence the unknown cyclic peptide using this technique on the LTQ XL ion trap. The first-generation product ion of m/z 495 from MS/MS of the doubly protonated unknown cyclic peptide (Table 2) was isolated in the ion trap and subjected to CID, and a second-generation product ion of m/z 348 was generated (S-Figure 4), indicating loss of a Phe residue. This product ion of m/z 348 was subjected to further CID, but the resulting product ion spectrum did not reveal loss of an amino acid residue (S-Figure 4). Another first-generation product ion of m/z 478 from MS/MS of the doubly protonated unknown cyclic peptide fragmented upon CID to a second-generation product ion of m/z 331, and the latter further to m/z 175 (S-Figure 5), indicating sequential loss of a Phe residue and an Arg. According to Gross et al , a protonated cyclic peptide opens the ring upon CID to generate isomeric protonated linear peptides and the latter lose an amino acid residue from the C-terminus. The results mentioned above suggest the connectivity of Lys, Phe, and Arg residues to be RFK. CID of the first-generation product ion of m/z 432 from the doubly protonated unknown peptide resulted in two series of second-generation product ions: one indicating sequential loss of a Lys residue and a Phe, and the other that of an Arg residue and a Phe (S-Figure 6). In short, the MSn results suggest only partial and contradicting sequence for the unknown peptide.
3.5 Elucidation of Amino Acid Sequence of the Unknown Peptide by ETD
3.6 Amino Acid Sequence Proposed for the Unknown Peptide
3.7 Reconciling the Proposed Sequence with the Experimental Results
4.1 Preferential Loss of Lysine Residue in CID
In the CID product ion spectrum of the tryptically digested cyclic peptide (bottom panel in Figure 2), the dominant peak at m/z 513.2822 indicates preferential loss of a protonated Lys residue from the doubly protonated peptide. This loss is even more feasible than that of a H2O molecule because the spectrum indicates that the doubly protonated peptide first lose a Lys residue (2 × m/z 321.1922 → m/z 513.2822, Table 2) and then a H2O molecule (m/z 513.2822 → 495.2712). Similar preferential loss of a Lys residue from the cyclic peptide itself is also indicated: the minute peak at m/z 512.2978 in its product ion spectrum (S-Figure 8) suggests loss of a protonated Lys residue. It has been reported that deamination and dehydration are the prominent fragmentation losses that an N-terminal glutamine of protonated linear peptides underwent in CID . However, preferential loss of a protonated Lys residue over that of a H2O molecule was observed for the tryptically digested cyclic peptide in the present study. This preferential loss of a protonated Lys could be accounted for by the mechanism of amino acid side chain-assisted amide bond cleavage  (Scheme 1). Such preferential loss of a protonated Lys residue also was observed for cyclic peptides, cyclo[RADfK], cyclo[RGDfK], and cyclo[RGDyK] (S-Figure 9 and S-Table 1), which were chosen in this study because of their commercial availability. Similar loss of a Lys or Gln residue from the cyclic b5 product ion of a known linear peptide YAKFLG or YAQFLG was reported by other investigators .
4.2 MSn Sequencing of Cyclic Peptides Containing Arg and Lys Residues
MSn is the technique of choice for sequencing cyclic peptides , and was evaluated for sequencing known cyclic peptides containing Arg and Lys residues, cyclo[RADfK], cyclo[RGDfK], and cyclo[RGDyK], to determine its applicability to sequencing of the unknown peptide. Doubly protonated cyclo[RADfK] of m/z 309.8 fragments upon CID to protonated RADf of m/z 490.2 and KRAD of m/z 471.2, doubly protonated cyclo[RGDfK] of m/z 302.9 to protonated RGDf of m/z 476.2 and KRGD of m/z 457.2, and doubly protonated cyclo[RGDyK] of m/z 310.8 to protonated RGDy of m/z 492.2 and KRGD of m/z 457.2 (S-Figure 9 and S-Table 1). MS/MS spectra of the first-generation product ions, RADf of m/z 490.2, RGDf of m/z 476.2, and RGDy of m/z 492.2, clearly indicate the loss of a Phe, Phe, and Tyr residue, respectively (S-Figure 10). However, MS/MS spectra of the second-generation product ions, protonated RAD of m/z 343.2, RGD of m/z 329.2, and RGD of m/z 329.2, do not obviously reveal the loss of an Asp residue (S-Figure 11). The same is true for the MS/MS spectra of the first-generation product ions, protonated KRAD of m/z 471.2, KRGD of m/z 457.2, and KRGD of m/z 457.2 (S-Figure 12). These results indicate that MSn can provide only partial sequence information for cyclic peptides containing adjacent Arg and Lys residues. Thus, MSn may not be able to provide complete sequence information for the unknown peptide, which is consistent with the experimental results.
4.3 Sequencing of Cyclic Peptides by ETD
Peptides have conventionally been sequenced by combination of chemical or enzyme degradation, mass spectrometry, and NMR, but certain quantity of a peptide is required. As modern mass spectrometry advances, linear peptides in small quantity can be easily sequenced. However, sequencing of cyclic peptides is complicated. To our knowledge, there has been no report on the use of ETD for sequencing of a cyclic peptide, although ETD has been utilized in sequencing of linear peptides but cyclic peptides were used in studying the fragmentation mechanism of electron capture dissociation (ECD) [31, 35]. The result of the present study demonstrates that ETD is useful for sequencing a cyclic peptide.
The major product ions in the ETD product ion spectrum of the cyclic tetrapeptide (top panel in Figure 3) can be categorized into two series: m/z 277.1 and 433.2; m/z 209.1, 365.3, and 512.3. The first series of product ions (m/z 277.1 and 433.2) are named pseudo-z ions in the present study because they are similar in chemical structure to relevant z-type product ions from a linear peptide but one mass unit less than the z-type product ions due to replacement of the carboxylic group with the hydroxyliminomethyl group (QF z2p ion in Scheme 3, for example). The m/z value of a pseudo-z ion equals that of the relevant b ion plus 1 (b ion – NH + NH + H, see the structure of QF z2p ion in Scheme 3) if the charge state of the former is 1+. This knowledge would assist with recognition of pseudo-z ions. The other series of product ions (m/z 209.1, 365.3, and 512.3) is named pseudo-b product ions because they are the same in mass as relevant b-type product ions from a linear peptide but different in chemical structure from the b-type ions (b1p ion in Scheme 3, for example). Similar to c and z product ions , bp and zp product ions are even- and odd-electron ions, respectively. The bp- and zp-type ions are useful for sequencing a cyclic peptide, as are c- and z-type product ions for sequencing a linear peptide. For instance, the peaks at m/z 209.1, 365.3, and 512.3 in the top panel of Figure 3 correspond to b1p, b2p, and b3p product ions, respectively. Based on these bp ions, a partial sequence proposed would be N(C6H9)Lys-Arg-Phe, which is the retro-sequence of the correct partial sequence for the peptide. The peaks at m/z 113.1, 277.1, and 433.2 in the top panel of Figure 3 relate to z1p, z2p, and z3p product ions, respectively. With these zp ions, a partial sequence proposed would be Arg-Phe-Gln, which is also the retro-sequence of the correct partial sequence. In short, the sequence proposed from bp and zp ion peaks in an ETD product ion spectrum of an unknown cyclic peptide is its retro sequence. By reversing the retro sequence, the correct sequence can be obtained. It should be noted that the above proposal for sequencing of an unknown cyclic peptide is infant because it is based on the results from one unknown peptide. Studies on more cyclic peptides are required to validate the proposal.
4.4 Possible Pharmacological Effects of the Identified Cyclic Peptide
Now that the major component of the unknown sample was identified as a cyclic tetrapeptide, the next relevant question would be what the pharmacologic effects of this cyclic peptide are. To address this question, extensive database searches were conducted: searches with SciFinder Scholar against the Chemical Abstracts and Medline using queries of molecular formula (C32H49N9O5), Substructure Search and Similarity Search with the cyclo[Arg-Lys-Gln-Phe] structure drawn; searches against the National Center for Biotechnology Informatics’ (NCBI) Protein Database with a query of Molecular Weight (639) filtered by Sequence Length (4). No exact match to the sequence proposed above or to cyclo[Arg-Lys-Gln-Phe] was found from the searches. Consequently, the pharmacologic effects of the cyclic tetrapeptide remain unknown except for the uncorroborated intelligence gathered at the racetrack on the unknown sample as “something for pain control.” However, it is known that cyclic peptides are a unique class of substances with diverse biological/pharmacologic actions resulting from confined and rigid three-dimensional structures, which range from antibiotics [11, 38], immunosuppression [39, 40], analgesics [41–44], somatostatin mimics [45–47], to facilitating leanness . The last three types of actions could enhance athletic performance in equine athletes. The cyclic peptide identified in this study might have performance-enhancing effects, which could account for why it was found in a barn at a racetrack and might have been abused in racehorses, drivers and/or jockeys. The identified cyclic tetrapeptide might be a “designer” drug, or of natural origin.
The unknown clear liquid in the seized syringe has been determined to contain a major component identified as a cyclic tetrapeptide, cyclo[Arg-Lys-N(C6H9)Gln-Phe], in which the N is an amide nitrogen atom to Lys and in α amino group to Gln. The chemical structure of the C6H9 substituent on the N atom has not been determined yet, but the most likely structure is proposed as a 4- or 3-cyclohexenyl group. The identified cyclic peptide has not been documented in the literature, its true pharmacological effects are unknown, but it might be a “designer” drug with athletic performance-enhancing effects.
To the authors’ knowledge, it is the first time that ETD was used for sequencing cyclic peptides. ETD of the triply protonated cyclic tetrapeptide resulted in two series of product ions that are named bp- and zp-type ions, respectively, and their origins are interpreted by the proposed fragmentation pathways. The bp and zp product ions will be helpful for de novo sequencing of cyclic peptides.
Preferential loss of a protonated Lys residue from the tryptically digested cyclic tetrapeptide was observed by CID, and accounted for by amino acid side chain-assisted fragmentation. Additionally, CID MSn cannot provide complete sequence information for cyclic peptides containing Arg and Lys residues.
The authors acknowledge financial support provided by The Pennsylvania Racing Commissions, and financial contributions made by the Pennsylvania Horsemen Association at Pocono Downs and Chester Downs, Meadows Standardbred Owners Association, Horsemen Benevolent and Protective Association at Penn National, and Presque Isles Downs. The authors thank ACD/Labs for use of ACD ChemSketch Freeware, and NCBI for granting access to protein database.