Amino Acids

, Volume 40, Issue 1, pp 15–28

Venomics: a new paradigm for natural products-based drug discovery

Authors

  • Irina Vetter
    • Institute for Molecular BioscienceThe University of Queensland
  • Jasmine L. Davis
    • Institute for Molecular BioscienceThe University of Queensland
  • Lachlan D. Rash
    • Institute for Molecular BioscienceThe University of Queensland
  • Raveendra Anangi
    • Institute for Molecular BioscienceThe University of Queensland
  • Mehdi Mobli
    • Institute for Molecular BioscienceThe University of Queensland
  • Paul F. Alewood
    • Institute for Molecular BioscienceThe University of Queensland
  • Richard J. Lewis
    • Institute for Molecular BioscienceThe University of Queensland
    • Institute for Molecular BioscienceThe University of Queensland
Review Article

DOI: 10.1007/s00726-010-0516-4

Cite this article as:
Vetter, I., Davis, J.L., Rash, L.D. et al. Amino Acids (2011) 40: 15. doi:10.1007/s00726-010-0516-4

Abstract

The remarkable potency and pharmacological diversity of animal venoms has made them an increasingly valuable source of lead molecules for drug and insecticide discovery. Nevertheless, most of the chemical diversity encoded within these venoms remains uncharacterized, despite decades of research, in part because of the small quantities of venom available. However, recent advances in the miniaturization of bioassays and improvements in the sensitivity of mass spectrometry and NMR spectroscopy have allowed unprecedented access to the molecular diversity of animal venoms. Here, we discuss these technological developments in the context of establishing a high-throughput pipeline for venoms-based drug discovery.

Keywords

Drug discoveryAnimal venomsVenomicsVenom peptidesNatural products

Introduction

The drug discovery pipeline is critically dependent on access to diverse molecular libraries capable of providing high-quality leads. A poor choice of libraries at the discovery phase leads to wasted resources and no drugs. Therefore, it is surprising that modern drug discovery approaches have narrowly focussed on libraries of small molecules that attempt to adhere to Lipinski’s rules for drug-like properties and which can be synthesized by combinatorial chemistry and screened in a high-throughput manner (Koehn and Carter 2005). These libraries often lack structural diversity, despite the large numbers of molecules present, as they are often centered around a limited number of “privileged” molecular scaffolds that have previously been flagged as interesting and which are amenable to combinatorial chemistry (Macarron 2006).

The complex molecular scaffolds found in natural products constitute a vast chemical diversity that is unmatched by synthetic molecules (Clardy and Walsh 2004; Koehn and Carter 2005). Indeed, one can consider the chemical space available from natural sources as the culmination of a billion-year drug discovery program with unlimited resources. The value of natural products as sources of new drugs is highlighted by the fact that ~50% of all drugs in clinical use are of natural product origin (Paterson and Anderson 2005). The limited success of the combinatorial chemistry approach has, therefore, rekindled interest in natural products as a component of modern drug discovery efforts (Butler 2004; Ortholand and Ganesan 2004; Koehn and Carter 2005; Paterson and Anderson 2005).

Animal venoms are a rich source of natural compounds that have evolved high affinity and selectivity for a diverse range of biological targets, especially membrane proteins such as ion channels, receptors, and transporters (Lewis and Garcia 2003; Tedford et al. 2004; Fry et al. 2009). Venomics has therefore emerged as an important addition to modern drug discovery efforts (Escoubas et al. 2008; Escoubas and King 2009). In particular, the high potency and specificity of many venom-derived peptides, their ease of chemical synthesis and/or recombinant production, and the resistance of many disulfide-rich peptides to proteolytic degradation, are attributes that have made them attractive drug leads (Harvey 1995; Lewis and Garcia 2003; Olivera 2006). As early as the 1970s, the antihypertensive drug captopril was developed from lead peptides isolated from the Brazilian viper Bothrops jararaca (Cushman and Ondetti 1991). More recently, a number of novel drugs have emerged from venom-based drug discovery efforts. ω-conotoxin MVIIA (Ziconitide, Prialt®), a 25-residue peptide from the venom of the aquatic cone snail Conus magus, was approved by the FDA in 2004 as an analgesic for the treatment of chronic pain (Miljanich 2004), while exenatide (Byetta®), a 39-residue glucagon-like peptide-1 agonist derived from the saliva of the Gila monster (Heloderma suspectum), was approved in 2005 for the treatment of type 2 diabetes (Malone et al. 2009). Many other venom-derived peptides are in various stages of pre-clinical or clinical development (Table 1).
Table 1

Examples of venom peptides or derivatives in clinical use or under development

Animal

Peptide/protein

Pharmacology

Indication

Stage

Snake

Captopril (Teprotide)

Inhibitor of angiotensin converting enzyme (ACE)

Hypertension

FDA approved (1981)

Snake

Eptifibatide (Integrilin™)

Inhibits fibrinogen binding to platelet glycoprotein IIb/IIIa receptor

Unstable angina

FDA approved (1998)

Cone snail

ω-Conotoxin MVIIA (Ziconotide, Prialt®)

Blocks CaV2.2 voltage-gated calcium channel

Chronic pain

FDA approved (2004)

Lizard

Exenatide (Byetta®)

Insulin secretagogue (incretin mimetic)

Type 2 diabetes mellitus

FDA approved (2005)

Snake

Ancrod

Anti-coagulant

Ischemic stroke

Phase III

Cone snail

χ-Conotoxin MrIA (Xen2174)

Inhibits noradrenalin transporter

Chronic pain

Phase II

Scorpion

Chlorotoxin (TM-601)

Binds MMP2 on surface of glioma cells, impairing invasion ability

Glioma

Phase II

Spider

Psalmotoxin-1

Blocks acid-sensing-ion channel 1a (ASIC1a)

Inflammatory pain

Pre-clinical

Spider

Tx2-6

Nitric oxide release

Erectile dysfunction

Pre-clinical

Sea anemone

ShK

Blocks KV1.3 voltage-gated potassium channel

Autoimmune diseases including MS

Pre-clinical

Despite their promise for drug discovery, most of the chemical diversity encoded within animal venoms remains uncharacterized. However, recent technological advances that facilitate high-throughput screening (HTS) and structural characterization of venoms and venom peptides promise to accelerate the venoms-based drug discovery pipeline. In the following sections, we discuss the various elements of this pipeline, which is outlined in Fig. 1. We review recent technological developments that promise to expedite venoms-based drug discovery, and provide recommendations and caveats based on the experience gained from our own drug discovery efforts.
https://static-content.springer.com/image/art%3A10.1007%2Fs00726-010-0516-4/MediaObjects/726_2010_516_Fig1_HTML.gif
Fig. 1

The key elements of a venoms-based drug discovery program. A robust high-throughput screen is essential to rapidly identify venoms with desired activity and to allow subsequent isolation of bioactive molecules. An efficient toxin production system is essential not only to produce sufficient toxin for functional and structural characterization, but also to facilitate structure–activity relationship (SAR) studies. Structural characterization is no longer a major bottleneck in the discovery pipeline due to recently developed high-throughput NMR methods

High-throughput assays

The systematic isolation and characterization of bioactives from the mixture of peptides, proteins, small molecules, and salts present in venoms is often referred to as activity-guided fractionation. This process relies critically on the selectivity, specificity and capacity of high-throughput assays. The basic requirements of HTS include high-assay sensitivity and accuracy as well as high-assay robustness and reproducibility (e.g., high Z-factor, as defined by Zhang et al. (1999). This is particularly important with respect to the screening of crude venoms or partially purified venom fractions. In contrast to combinatorial chemical libraries, which often comprise many thousands if not millions of compounds, natural product libraries frequently consist of crude or partially purified mixtures of compounds with diverse biological effects. This reduces the resource requirements for screening, but at the same time introduces the potential for interference from non-target-specific effects from other components of the venom. Thus, while the traditional goal for HTS in the pharmaceutical industry has been to increase screening capacity by increasingly automating and miniaturizing these assays, HTS in the context of venomics arguably requires greater emphasis on data quality.

With the continuing definition of new potential drug targets, the repertoire of assays amenable to HTS has increased significantly in recent years. These include more traditional assays, such as electrophysiology, absorbance/fluorescence-based assays, radioligand binding, and ELISAs, as well as more recent developments, such as AlphaScreen and label-free technologies, and assays based on bioluminescence, fluorescence polarization, fluorescence-resonance energy transfer (FRET), bioluminescence resonance energy transfer, and scintillation proximity assays. In the following sections, we discuss the advantages and disadvantages of some of the most commonly used screening techniques in the context of a venoms-based drug discovery program.

Electrophysiology

Although patch-clamp or voltage-clamp electrophysiology is generally considered the gold-standard assay for assessing the functional activity of ion channels, it requires a high level of technical expertise and is generally used in low-throughput format. Recent advances include the development of automated platforms for single-cell electrophysiology studies, thus increasing throughput and simplifying assays (Bennett and Guthrie 2003). Commercially available automated high-throughput electrophysiology platforms such as the IonWorks Quattro (Molecular Devices), PatchXpress (Axon Instruments), and Flyscreen 8500 (Flyion GmbH) have enabled the development of high-throughput assays for a variety of ion channels of therapeutic interest (Hamelin et al. 2005; Dunlop et al. 2008). However, despite the advances in HTS offered by these platforms, problems relating to throughput and reproducibility still remain. These are partly due to inherent difficulties associated with using cells, such as the need for robust, high-quality cells with stable expression of the ion channel of interest and ideal patch-clamping characteristics (such as membrane seal and stability) (Dunlop et al. 2008).

AlphaScreen technology

Amplified Luminescent Proximity Homogeneous Assay or AlphaScreen technology is based on the principle of luminescent oxygen channeling (Ullman et al. 1994; Ullman et al. 1996). AlphaScreen assays were reported to have superior sensitivity and dynamic range when compared with time-resolved fluorescence and time-resolved FRET in the context of a nuclear receptor assay (Glickman et al. 2002). The ability to generate AlphaScreen beads specific to virtually any target of interest, second messenger or protein, has seen this technology applied to the detection of second messengers including cAMP and IP1, kinase activity, and receptor–ligand interactions (Taouji et al. 2009). These characteristics make AlphaScreen technology applicable to the HTS of venoms and venom components.

Label-free systems

Label-free systems such as the XCELLigence (Roche), BIND (SRU Biosystems), OWLS 210 (MicroVacuum), RAPid4 (TPP Labtech), and EPIC (Corning) platforms measure impedance, acoustic and optical resonance, or refractive index. While generally used for biochemical assays, label-free systems are increasingly being adapted to cell-based assays, where activation of ion channels and receptors, proliferation or apoptosis, infection with viruses, and other cellular events lead to microstructural changes in the cell that are detected by these platforms. Label-free systems can avoid interference associated with the use of labels, and they enable detection of allosteric modulators with novel mechanisms of action. However, these systems may not be sensitive enough to detect small changes in cellular function, and they are potentially susceptible to interference from other venom components, yielding a higher number of false positives and negatives. Intuitively, this approach might be more suitable to more purified venom libraries to minimize such interference, although much more work is required to determine the place of label-free systems in HTS of venoms.

Radioligand binding

Radioligand binding assays have often been the method of choice for HTS of venoms as they are cheap and easily automated (Denyer et al. 1998). Radioligand binding assays rely on the competition between an unlabeled venom component and a radiolabeled ligand for the same or similar binding site on a target of interest. Consequently, radioligand binding assays are ill-suited to detection of allosteric modulators that do not share the binding site of traditional ligands (Noël et al. 2001). This is a significant caveat for the screening of voltage-gated ion channels, because multiple ligand binding sites have been described for many channels and consequently radioligand binding assays would be expected to yield a particularly high number of false negatives.

Fluorescence-based assays

Fluorescence-based assays are particularly amenable to HTS as they are robust and easily set up. Many platforms are available for fluorescence-based HTS, which usually differ in their ability to detect varying wavelengths, as well as their compound addition mechanics. The FLIPRTETRA (Molecular Devices), Cell Lux (Perkin Elmer), and FDSS 6000 (Hamamatsu) in particular can be considered true high-throughput platforms as they are capable of measuring fluorescence from 96 wells simultaneously.

In recent years, a plethora of fluorescent dyes has become available for measurement of intracellular calcium (e.g. Fura-2, Fluo-3, Fluo4, and Calcium Green), sodium (e.g. SBFI, CoroNa Green, Sodium Green, and CoroNa Red), potassium (e.g. PBFI) and chloride (e.g. SPQ) ions. FRET assays based on voltage-sensitive dyes can also be used to monitor changes in membrane potential (Xu et al. 2001). These assays employ oxonol dyes as FRET acceptors, and coumarin-tagged phospholipids in the outer leaflet of the cell membrane as FRET donors (Zheng et al. 2004). The fluorescent properties of these dyes are altered by the binding of their cognate ions, or a change in membrane potential, and thus they can be used to detect changes in ion concentration within the cell, or a voltage change across the cell membrane.

Calcium-sensitive fluorescent dyes generally give the most robust performance due to the large Ca2+ gradient across cells, and therefore are commonly used for HTS (Zheng et al. 2004). The ubiquitous nature of the Ca2+ signal can be utilized for the identification and isolation of venom components active at many targets, including voltage-gated and ligand-gated-ion channels (e.g. TRPV1, nicotinic acetylcholine receptors (nAChRs), and voltage-gated calcium channels) and G protein-coupled receptors (GPCRs) (Kitaguchi and Swartz 2005; Rivers et al. 2005). Although cell lines overexpressing targets of interest are typically used for HTS, it is also possible to exploit endogenously expressed ion channels or GPCRs for functional assessment of Ca2+ responses (Vetter and Lewis 2009). For example, we recently identified a novel α-conotoxin by screening fractionated crude venom for inhibition of endogenous α7 nAChR responses in human neuroblastoma cells (Fig. 2). Owing to the ubiquitous nature of the Ca2+ signal, fluorescent Ca2+ imaging also allows the isolation and identification of venom components with unknown targets or modes of action. Indeed, many venoms elicit increases in intracellular Ca2+, and this approach has been used to study venom or venom components from the ectoparasitoid wasp Nasonia vitripennis (Rivers et al. 2005), the Portugese Man-of-War Physalia physalis (Edwards and Hessinger 2000), the duck-billed platypus Ornithorhynchus anatinus (Kita et al. 2009), and the Trinidad chevron tarantula Psalmopoeus cambridgei (Siemens et al. 2006).
https://static-content.springer.com/image/art%3A10.1007%2Fs00726-010-0516-4/MediaObjects/726_2010_516_Fig2_HTML.gif
Fig. 2

Identification of a novel α-conotoxin from fractionated crude venom using a fluorescence-based HTS. SH-SY5Y human neuroblastoma cells were loaded with the calcium-sensitive fluorescent dye Fluo-4 and calcium responses were monitored for 550 s using a FLIPRTETRA. After addition of crude venom fractions, endogenously expressed α7 nAChRs were stimulated with choline (30 µM) in the presence of the allosteric modulator PNU120596. Wells H1–H12 are control responses. Fraction 17 (well B5, highlighted) was identified as containing an α7 nAChR inhibitor. Further sub-fractionation and screening led to the isolation of a novel α-conotoxin

In summary, successful screening requires a well-defined assay format that can readily and reliably identify actives (hits) preferably in crude venoms or alternatively in fractionated venoms. Crude venom “hits” need to be confirmed with fractionated venoms early in the isolation process as, depending on the venom and target, a high attrition rate can be encountered at this step. Recovery of activity also needs to be quantified at each step, since recovery of activity can be poor for some venom classes, especially those that elute either near the void volume or very late in reversed-phase HPLC (rpHPLC).

Isolation and sequencing of active toxins

In the context of drug discovery, the isolation and characterization of venom components is carried out for two main reasons: (1) to identify and characterize the component(s) responsible for the activity observed in a bioassay; or (2) to survey the complete structural diversity of a venom with the aim of discovering new sequences with novel structural scaffolds and pharmacologies, which can then be made in sufficient quantities and screened for activity in a panel of bioassays.

Although easily defined, these tasks are not trivial due mainly to the vast complexity of animal venoms. Snake venoms are known to contain >100 unique components (Calvete et al. 2007) while venoms from scorpions, spiders, and cone snails have recently been shown to contain anywhere from 300 to over 1,000 unique molecular entities (Pimenta et al. 2001; Escoubas et al. 2006; Newton et al. 2007; Biass et al. 2009; Davis et al. 2009). It is this amazing complexity combined often with small amounts of material that poses the biggest challenge to venomics. These challenges are further confounded by the paucity of genomic data and therefore the lack of theoretical protein databases that are instrumental for protein identification using classical proteomic approaches (Escoubas et al. 2008; Escoubas and King 2009).

The following sections discuss approaches taken to achieve the two goals outlined above, including recent technological developments that are having the greatest impact on this field.

Assay-guided fractionation

In an academic laboratory setting, once an active venom or fraction has been identified via bioassay, and given the amount of material available is sufficient, assay-guided isolation and characterization of the active toxin is relatively straightforward. The tandem use of orthogonal separation techniques, such as rpHPLC to separate by hydrophobicity followed by ion exchange HPLC to separate by charge, is an efficient approach for obtaining pure peptides for amino acid sequence analysis, usually by Edman degradation. These processes, although effective, are time and resource consuming. If one is following up on only a few “hits” this may not be an issue, however, with the progress of HTS comes the scenario where many active fractions need to be purified and characterized to determine the most suitable leads for further study. The biggest challenge in this respect is rapid, reliable, and cost-effective peptide sequencing.

Whole-venom analysis: how many peptides?

As opposed to activity-guided fractionation, in which one is trying to isolate a limited number of components, whole-venom analysis is much a much more challenging prospect as it aims to determine the full structural diversity present in a given venom (i.e. the total number of components and their full, or at least partial, sequences). This task can be approached from two different, yet complementary, directions: proteomic/peptidomic and transcriptomic (genomic). In theory, both approaches should provide a similar, and hopefully complete, picture of a venom. The proteomic approach is based on the final product (i.e. the proteins and peptides secreted into the venom), while the transcriptomic approach is based on the genomic recipe book of potential toxins.

The first step in obtaining a holistic view of a venom is to determine exactly how complex it is, in other words, how many unique components it contains. In this respect, advances in the sensitivity, resolution, and accuracy of mass spectrometry have been the major contributors to our improved ability to determine venom complexity (Escoubas et al. 2008). The advent of soft ionization processes, including electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI) has allowed direct observation of molecules in crude venoms. MALDI-TOF MS is generally regarded as superior to ESI–MS for analyzing complex samples, whereas ESI, being liquid based, is more amenable to interfacing with online separation techniques such as rpHPLC. Despite this, MALDI-TOF MS of crude venoms typically results in relatively low mass counts due to ion suppression effects that prevent ionization of all molecules, particularly minor components (Escoubas et al. 2008). This problem can be largely obviated by combining one or more separation techniques with MS analysis either online (LC–ESI–MS) or offline (rpHPLC prior to MALDI-TOF MS). Naturally, this leads to the question as to which technique is best suited for the task. Startling results have been achieved using both methods as highlighted by two recent studies. Escoubas et al. (2006) used offline rpHPLC combined with MALDI-TOF MS to reveal more than 600 peptide masses in the venom of male Sydney funnel-web spider Atrax robustus, and over 1,000 in the venom of the female Blue Mountains funnel-web spider Hadroyche versuta. Davis et al. (2009) used online rpHPLC ESI–MS to demonstrate that a single specimen of the cone snail Conus textile can contain as many as 1,117 unique masses. From a study of Conus consors venom that directly compared both analytical methods the authors concluded that the techniques are in fact complementary, with only 21% of the masses being common to both data sets (Biass et al. 2009).

Liquid chromatography is not the only way to achieve separation of complex mixtures, with more traditional proteomic approaches also having been used to successfully map venom landscapes. For example, Liang’s group used size exclusion chromatography of crude venom followed by either two-dimensional (2D) gel electrophoresis, tryptic digestion, and MS/MS (for components >10 kDa) or ion exchange then rpHPLC and Edman degradation (for masses <10 kDa) to characterize the venom of two local tarantulas, Haplopelma huwenum (Yuan et al. 2007) and Chilobrachys jingzhao (Liao et al. 2007). Both venoms were found to contain >90 proteins (47 identified by MS/MS) and around 100–120 peptides with around half being fully or partially sequenced. A very similar picture (>80 proteins and 100–200 peptides) was obtained by applying a similar approach to the venom of Brazilian armed spiders (Phoneutria spp.) (Richardson et al. 2006).

Traditional gel-based proteomics approaches have been used mostly for snake venoms, which is not surprising given they are dominated by proteins larger than 10 kDa. Calvete et al. demonstrated the power of an initial rpHPLC step to resolve and recover most, if not all, venom components in the mass range usually reserved for gel electrophoresis (i.e. 7–150 kDa) as well as small peptides (0.4–7 kDa) (reviewed in Calvete et al. 2007, 2009). The use of rpHPLC with UV detection instead of gels has the added advantages of leaving the isolated components in solution and avoiding the requirement for de-staining and in-gel reactions and extractions, which are time consuming and can result in sample loss.

Determining amino acid sequences

Determining the complexity of venoms via mass counts provides only limited information in regards to biological activity. Masses can provide hints as to which toxin class a peptide might belong, but in order to more fully appreciate and exploit the diversity of chemical space present in animal venoms, we need to know the primary and tertiary structure of a peptide and how this relates to its biological activity. Determination of the amino acid sequence of a venom peptide also enables it to be produced via chemical synthesis or recombinant methods (discussed in the next section), thus providing material for further characterization without overexploiting the natural source.

For the past 50 years, peptide sequencing has been performed predominantly by Edman degradation. Although reliable, this method is time consuming and expensive. Furthemore, Edman degradation is not possible when peptides are N-terminally blocked; for example, according to the ArachnoServer database [www.arachnoserver.org (Wood et al. 2009)] ~1.5% of all known spider toxins are N-terminally blocked with pyroglutamate. The development and refinement of tandem MS methods (MS/MS) has allowed the rapid and efficient identification of proteins via the determination of short peptide sequences which are then matched to protein databases. MS/MS requires very little material and mixtures of peptides can be sequenced rapidly. In addition, it enables detection of post-translational modifications (PTMs), which are typically not identified in Edman or genomic analyses. Confirmation of the presence of PTMs is important as they can contribute to the biological activity of the peptide and/or its resistance to enzymatic degradation (Jakubowski et al. 2006).

Given the vast array of biologically active peptides present in animal venoms and the lack of genomic information from which to build theoretical protein databases, it is easy to see why de novo sequencing by MS is one of the most important recent developments for venomics studies. The theory and limitations of de novo sequencing by MS/MS and its application to venomics have been thoroughly reviewed recently (Escoubas et al. 2008; Seidler et al. 2009) and hence the following will be a brief overview of the developments most applicable to the rapid determination of novel peptide sequences from venoms.

The most common MS/MS fragmentation technique, collision-induced dissociation (CID), involves the collision of an ionized peptide (preferably multiply charged, which makes this technique most applicable to ESI–MS, as MALDI-TOF MS results primarily in singly charged species) with an inert gas and subsequent fragmentation along the peptide backbone. The resulting mass spectra can be complex, containing multiple ion types (b, y, a, often –NH3 or -H2O, etc.), making them difficult to interpret, particularly if the peptide contains labile groups such as PTMs (Kjeldsen et al. 2007). Thus, for rapid, reliable and high-throughput sequence determination, alternative fragmentation techniques that result in simpler spectra but with comprehensive sequence coverage are required. Two recently reported techniques, electron transfer dissociation (ETD) and MALDI in-source decay (ISD), appear to hold the most promise for venomics due to their speed, easy of data interpretation, and their direct accessibility without the need for specialized or expensive high-end MS instruments (such as FT-ICR MS).

MALDI-ISD is the process whereby a peptide fragments in the time between ionization and ion extraction and typically results in c and z ions (fragmentation at the N–Cα bond), as opposed to b and y ions (fragmentation at the peptide bond) seen in CID MS/MS. MALDI-ISD is facilitated by the use of 1,5-diaminonapthelene (1,5-DAN) as the matrix (Fukuyama et al. 2006). Quinton et al. (2007) demonstrated the usefulness of this approach for characterizing disulfide-rich peptides from cone snail venom. They found that 1,5-DAN has two characteristics that greatly aid the analysis of intact (i.e. fully oxidized) venom peptides, thereby eliminating the need for reduction and alkylation: first, the resultant spectra are dominated by c ions which greatly facilitates analysis, and second, the number of disulfide bonds can be readily determined due to their in-source partial reduction. Another major advantage of this approach is that it is achieved in a single MS step and does not require tandem MS instruments. In our own hands, this approach enables generation of ten unambiguous sequence tags of 10–20 residues in approximately 30 min. The major limitations of MALDI-ISD are the requirement for pure peptides (>90%) and the lack of coverage at the N- and C-termini. However, it is an ideal method for rapid generation of sequence tags that can be used to design primers for extraction of complete transcript sequences from a venom-gland cDNA library (discussed below).

ETD is a fragmentation method that evolved from electron-capture dissociation (ECD), but has the advantage that it can be performed on a more accessible mass spectrometer (e.g. linear 2D quadrupole ion trap, as opposed to FT-ICR-MS) and produces high-quality spectra from a single scan, making it amenable to online MS/MS. ETD also results in the generation of c and z ions, and it is particularly useful for venomics investigations as it can produce fragmentation across the entire peptide backbone and is well suited to sequencing peptides with PTMs (Seidler et al. 2009). The power of ETD for venomics was highlighted in a recent study where 31 complete peptide sequences were obtained using just 7% of the venom dissected from a single cone snail, which represented one-third of all masses detected (Ueberheide et al. 2009).

Current de novo peptide sequencing by MS/MS can be time consuming and requires considerable operator expertise to generate and analyse fragmentation spectra. However, advances in fragmentation methods such as MALDI-ISD and ETD (which are complementary to the widespread ESI-CID method) and improved bioinformatic tools for automated interpretation of the spectra are beginning to allow unprecedented, rapid access to the full landscape of venom peptides.

Transcriptomics

An alternative, yet complementary, approach to obtaining venom-peptide sequences is via the use of venom-gland cDNA libraries. This involves either global profiling of the venom via random sequencing of expressed sequence tags (ESTs) or fishing for transcript(s) encoding a specific toxin for which partial amino acid sequence information has already been obtained by Edman degradation and/or MS methods.

ESTs are short DNA sequences (~200–500 bp) generated by sequencing the ends of expressed genes (i.e. transcripts in a cDNA library). Given that most animal-venom peptides are <100 residues long, the sequencing of ESTs often yields the complete DNA sequence of a peptide–toxin precursor. The EST approach has been successfully applied to venom from scorpions (Schwartz et al. 2007; D’Suze et al. 2009), spiders (Kozlov et al. 2005; Jiang et al. 2008), and snakes (Francischetti et al. 2004). Typically, several thousand clones are sequenced, resulting in 100–1,000 s of cDNA sequences. This method has already resulted in the identification of many novel peptide–toxin sequences and, in combination with the powerful new de novo sequencing tools described above, it marks a new era in our understanding of the chemical and functional diversity of animal venoms.

In the context of a venoms-based drug discovery program, however, one is typically only concerned with obtaining the complete sequence of toxins (and potentially paralogs thereof) that were found to be hits in a HTS. It would be expensive and far from guaranteed to try to obtain the desired sequence from random sequencing of ESTs, especially if the toxin arises from a poorly expressed transcript. Instead, it is safer and cheaper to sequence the desired transcript using RACE (rapid amplification of cDNA ends) technology (Frohman 1993). This approach relies on first obtaining a short sequence tag from the mature toxin using Edman degradation and/or MS. Using a gene-specific primer designed from this sequence tag, 5′ RACE is then employed to obtain upstream sequence information, which includes the 3′ untranslated region (UTR), the signal sequence, and very often a propeptide sequence; 3′ RACE is then used in conjunction with a primer derived from the signal sequence or 3′ UTR to obtain downstream sequence information, including the complete sequence of the mature toxin sequence and 5′ UTR. It has been shown that the signal sequence is highly conserved (i.e. under strong negative selection) for families of paralogous toxins (Sollod et al. 2005) and hence the 3′ RACE will typically yield not just the desired toxin sequence but also the sequence of several paralogs, the activities of which can often provide useful structure–activity relationship (SAR) information.

Production of recombinant and synthetic toxins

Perhaps, the most significant bottleneck with regard to realizing the full potential of animal venoms in the context of a drug discovery pipeline is the rapid production of sufficient quantities of venom peptides for complete structural, functional, and in vivo characterization. Only rarely is sufficient native material available for such studies. The alternative is to produce these peptides by recombinant methods or solid-phase peptide synthesis (SPPS).

Production of venom peptides presents a significant technical challenge due to the large number of structurally important disulfide bonds. Regardless of the method of production, the problem of producing the native disulfide-bond architecture must be addressed as a toxin with three, four, or five disulfide bonds is theoretically capable of forming 15, 105, or 945 different disulfide-bond isomers, respectively. Coupling SPPS with native chemical ligation (NCL) turns this problem on its head by taking advantage of the high cysteine content of peptide toxins to increase the efficiency of chemical synthesis. NCL involves chemoselective reaction of an unprotected peptide-α-thioester with the N-terminal Cys residue of a second peptide fragment, giving rise to a thioester-linked intermediate that spontaneously rearranges to form a native amide bond at the ligation site (Dawson et al. 1994). The rationale behind NCL is that the yield of synthetic peptide can be significantly improved by dividing the peptide into two or more easily synthesized fragments that can subsequently be joined (Fig. 3). Owing to their high-cysteine content, disulfide-rich peptide toxins are ideal candidates for NCL (Jensen et al. 2009).
https://static-content.springer.com/image/art%3A10.1007%2Fs00726-010-0516-4/MediaObjects/726_2010_516_Fig3_HTML.gif
Fig. 3

Synthesis of disulfide-rich toxins using native chemical ligation (NCL). In this schematic example, the toxin has been divided into two fragments. The N-terminal fragment (Fragment 1) has an unprotected peptide-α-thioester at its C-terminus, whereas the C-terminal fragment (Fragment 2) has an N-terminal Cys residue, which forms part of the native sequence, at its N-terminus. Chemoselective reaction of the two fragments yields a thioester-linked intermediate that spontaneously rearranges to form a native amide bond at the ligation site (Dawson et al. 1994). With regard to peptide size, the same limitations apply to synthesis of NCL fragments as with standard SPPS; that is, the efficiency of fragment synthesis will be sequence dependent, but in general will not be problematic up to ~30 residues. The folding conditions for toxins produced by NCL will be no different from those produced by recombinant methods or traditional SPPS

Venom peptides can also be produced by overexpression in bacteria, yeast, or insect cells. Recombinant protein production is generally more time and cost-effective than SPPS and it enables facile site-directed mutagenesis for SAR studies and isotopic labeling for multidimensional NMR (discussed in the following section). However, it may not be possible to introduce exotic PTMs found in the native toxin using this approach and one is generally limited to genetically encoded l-amino acids for SAR studies.

Escherichia coli, a rod-shaped gram-negative bacterium, is the most widely used recombinant expression system and generally the first choice for recombinant venom-peptide production. The recombinant peptide is generally directed to the cytoplasm but this has the significant disadvantage that it does not allow disulfide-bond formation as the intracellular thioredoxin and glutaredoxin disulfide-reducing pathways ensure that all cytoplasmic cysteine residues are kept in the reduced state (Prinz et al. 1997). Consequently, many venom peptides produced via this approach accumulate as reduced, unfolded proteins in cytoplasmic inclusion bodies, and protein purification therefore requires time consuming denaturation and refolding steps (Zilberberg et al. 1996; Froy et al. 1999). An alternative approach that potentially obviates this problem is to use an E. coli strain with a defective thioredoxin reductase that allows cytoplasmic disulfide bond formation (Tedford et al. 2001). However, in our experience, a more effective approach is to direct the recombinant peptide to the E. coli periplasm where the machinery for disulfide-bond formation and isomerization is located. This method has been used previously to produce recombinant snake and scorpion toxins (Ducancel et al. 1989; Legros et al. 1997; Korolkova et al. 2001), and we now routinely use this approach for recombinant production of disulfide-rich spider toxins (see Table 2).
Table 2

Recombinant toxins produced by expression in the periplasm of E. coli

Toxin

Origin

Number of residues

Number of disulfide bonds

Yield (mg/l)

References

A-erabutoxin

Snake

61

4

0.4–0.7

(Ducancel et al. 1989)

KTX2

Scorpion

63

3

0.2–0.3

(Legros et al. 1997)

AaHI/AaHII

Scorpion

64

4

ND

(Legros et al. 2001)

Iberiotoxin

Scorpion

37

3

1.0–1.5

a

PcTx1

Spider

40

3

3–5

a

TaITX-1

Spider

50

3

ND

a

SFI1

Spider

46

4

ND

a

PaurTx3

Spider

34

3

1.0

a

ND not determined

aCurrent authors (unpublished data)

Eukaryotic expression systems have also been used for production of disulfide-rich toxins. However, mammalian and insect expression systems are typically costly, time consuming, produce low yields, and isotopic labeling for NMR studies is difficult and expensive (e.g. Escoubas et al. 2003; Ji et al. 2005). In contrast, yeast has proved to be an excellent system for expression of disulfide-rich animal toxins (Anangi et al. 2007). Yeast expression offers several advantages including high yields, low cost, the ability to incorporate PTMs, such as glycosylation, and cost-effective isotopic labeling (although not as facile or cheap as with E. coli) (Demain and Vaishnav 2009). Moreover, the ability to secrete folded toxins directly into the yeast growth medium simplifies toxin purification (Daly and Hearn 2005; Anangi et al. 2007). The two most utilized yeasts are Saccharomyces cerevisiae and Pichia pastoris (Cregg et al. 2000; Macauley-Patrick et al. 2005; Cregg 2007), with the later proving to be an excellent system for the production of toxins with five or more disulfide bonds (Table 3).
Table 3

Disulfide-rich toxins successfully expressed in yeast

Toxin

Origin

Host

No. of residues

No. of disulfides

Yield (mg/l)

References

Huwentoxin

Spider

S. cerevisiae

55

3

10–12

(Peng et al. 2006)

SHL-Ib1b

Spider

S. cerevisiae

34

3

1.0

(Jiang et al. 2009)

GsMTx4

Spider

P. pastoris

34

3

100

(Park et al. 2008)

Jingzhaotoxin-34

Spider

S. cerevisiae

35

3

4.0

(Chen et al. 2009)

ProTx2

Spider

P. pastoris

30

3

ND

a

Rhodostomin

Snake

P. pastoris

68

6

10–15

(Guo et al. 2001)

Echistatin

Snake

P. pastoris

49

4

2–5

b

Halydin

Snake

P. pastoris

76

7

ND

(You et al. 2003)

Dendroaspin

Snake

P. pastoris

59

4

5-10

(Chen et al. 2006)

Erabutoxin

Snake

P. pastoris

61

4

12–18

b

α-Bungarotoxin

Snake

P. pastoris

75

5

0.1

(Levandoski et al. 2000)

γ-Bungarotoxin

Snake

P. pastoris

68

5

2-5

(Shiu et al. 2004)

κ-Bungarotoxin

Snake

P. pastoris

68

5

0.1

(Fiordalisi et al. 1996)

m1-Toxin1

Snake

P. pastoris

66

4

0.1

(Krajewski et al. 2001)

ButaIT

Scopion

P. pastoris

38

4

1.0

(Pham Trung et al. 2006)

BmαTx14

Scorpion

P. pastoris

64

4

100–120

(Wang et al. 2006)

Bmk M1

Scorpion

P. pastoris

64

4

10

(Shao et al. 1999)

BmP05

Scorpion

S. cerevisiae

31

3

8–10

(Wu et al. 2002)

MgTx

Scorpion

P. pastoris

39

3

12–15

a

AgTX2

Scorpion

P. pastoris

38

3

15–18

a

APETx2

Anemone

P. pastoris

42

3

2–4

a

aAnangi et al. (unpublished data)

bChuang et al. (unpublished data)

Thus, although still challenging, toxin production is not the major bottleneck it once was for venoms-based drug discovery. In the context of our own drug discovery programs, we have found that periplasmic expression in E. coli works well for most toxins, with extracellular secretion in Pichia pastoris serving as a backup. SPPS is usually a last resort except when non-natural modifications or exotic PTMs are required. This makes SPPS particularly suitable for production of cone snail peptides since, according to ConoServer (Kaas et al. 2008), 77% of all structurally characterized conopeptides contain at least one PTM other than disulfide bonds.

High-throughput structure determination

Determining the structure of a lead toxin provides a platform for understanding the molecular basis of its interaction with its target and, moreover, the structure can be combined with SAR studies to develop a three-dimensional (3D) pharmacophore that can be used for mimetic design (Baell et al. 2004; Schroeder et al. 2004). The method of choice for high-throughput protein structure determination is generally X-ray crystallography, where much of the process has been automated, from sample handling to the actual structure determination process. However, for proteins smaller than 100 amino acid residues, which includes the vast majority of venom polypeptides, NMR spectroscopy is by far the dominant approach (Fig. 4a). The Protein Data Bank currently contains 170 structures of venom proteins smaller than 9 kDa, of which 82% were determined using NMR. Thus, significant increases in the speed of toxin structure determination will necessarily require advances in the speed of NMR data acquisition and analysis.
https://static-content.springer.com/image/art%3A10.1007%2Fs00726-010-0516-4/MediaObjects/726_2010_516_Fig4_HTML.gif
Fig. 4

a Fraction of structures in the Protein Data Bank (PDB) determined using NMR as a function of molecular mass. Proteins are grouped into 5-kDa bins. Almost 80% of all structures of proteins <5 kDa were determined using NMR. b Comparison of planes from 3D CBCA(CO)NH spectra of a 41-residue spider toxin acquired using either uniform sampling and Fourier transformation (spectra i and ii) or non-uniform sampling (NUS) and maximum entropy (MaxEnt) reconstruction (spectrum iii). Spectrum (i) was collected by uniform sampling of 4,800 complex data points over a period of 42 h. Spectrum (ii) was acquired by uniform sampling of 480 complex data points over a period of 4.2 h, which leads to much lower resolution than for spectrum (i). Spectrum (iii) was acquired by non-uniform sampling of 480 complex data points over 4.2 h, followed by MaxEnt reconstruction. The spectrum obtained via NUS/MaxEnt has significantly higher resolution (i.e., narrower peaks) than the conventional spectrum collected via uniform sampling/Fourier transformation despite being acquired ten times more quickly

There are two established approaches for determining structures by NMR: the homonuclear approach relies exclusively on data from hydrogen nuclei (protons), whereas the heteronuclear approach additionally uses data from carbon and nitrogen nuclei (King and Mobli 2010). The classical homonuclear NMR approach (Wüthrich 1986) relies on 2D datasets that can typically be acquired in <3 days. However, these 2D spectra tend to be densely populated with peaks, often leading to ambiguities that must be manually resolved. As a result, resonance assignment tends to be rate limiting and structures can take months to determine. For this reason, homonuclear-based NMR structure determination is likely to be a major bottleneck in venoms-based drug discovery programs as it would take more than 1 year to determine the structure of even a modest number of hits (20–30) from a high-throughput assay.

For this reason, we advocate using a heteronuclear-based approach. Although 3D heteronuclear NMR datasets can take several weeks to acquire, interpretation of the resultant sparsely populated spectra is much easier as fewer ambiguities are present, data analysis can be completely automated, and the additional information content results in higher quality structures. The heteronuclear NMR approach requires the protein to be uniformly labeled with either 15N or both 15N and 13C, which is now facile due to the development of efficient bacterial and yeast expression systems as described in the previous section. Moreover, numerous methods have been developed in recent years for increasing the speed at which heteronuclear NMR datasets can be acquired. Although several approaches have been proposed, they nearly all rely on the same basic principle, which is to acquire a subset of the complete dataset and then use computational approaches for extracting the relevant information from the “reduced” dataset. This mode of data acquisition can generally be described as non-uniform sampling (NUS).

Within this general strategy, two different approaches have emerged. One approach is based on correlating data acquisition in two dimensions, which leads to reduced dimensionality projections of the higher dimensional data. Another approach samples at random points of the higher dimensional object and then attempts to approximate the higher dimensional object through a reconstruction process. Methods based on the first approach include G-Fourier transform (GFT) NMR (Szyperski et al. 1993), reduced dimensionality NMR (Ding and Gronenborn 2002), back projection reconstruction (BPR) (Kupče and Freeman 2004), automated projection spectroscopy (APSY) (Hiller et al. 2005), high-resolution iterative frequency identification (Eghbalnia et al. 2005) and projection decomposition (PRODECOMP) (Malmodin and Billeter 2005). The second approach includes methods such as non-uniform FT (Kazimierczuk et al. 2006), maximum entropy (MaxEnt) (Hoch and Stern 2002), multidimensional decomposition (MDD) (Luan et al. 2005), and BPR. Within the protein structure initiative (PSI) funded by the NIH, the Northeast Structural Genomics Consortium (NESG) have chosen to use the GFT approach, the Structural Genomics Consortium Toronto (SGC Toronto) are mainly using MDD, and the Joint Center for Structural Genomics (JCSG) have opted for APSY.

In our structural venomics efforts at the University of Queensland we use automated MaxEnt (Mobli et al. 2007a; Mobli et al. 2007b; Mobli et al. 2010) (Fig. 4B) as it allows full automation of the sometimes time consuming processing step and, from theoretical analysis and the available experimental data, it is clear that MaxEnt works best in situations of low signal-to-noise. By combining NUS/MaxEnt with automated sequence-specific resonance assignment via the PINE web server (Bahrami et al. 2009) and automated NOESY spectral analysis and structure calculation using the CANDID module in CYANA (Herrmann et al. 2002; Güntert 2004), we can routinely determine the structures of venom proteins in less than one week using sample concentrations as low as 150 μM (Mobli et al. 2009). Thus, using a heteronuclear NMR approach, 3D structure determination is no longer likely to be a major bottleneck in venoms-based drug discovery programs.

Concluding remarks

Animal venoms can be viewed as a pre-optimized combinatorial library of small peptides with high potency and specificity for a range of cell-surface targets, such as ion channels, transporters, and GPCRs (Sollod et al. 2005). Consequently, they have become a valuable resource for the discovery of new pharmacological agents. Many of the venom peptides that have emerged from low-throughput assay-guided fractionation studies have proved useful as molecular probes and as lead compounds for drug and insecticide development (Lewis and Garcia 2003; Tedford et al. 2004; Escoubas and King 2009). Nevertheless, the vast majority of the chemical diversity present in animal venoms remains untapped.

Fortunately, the miniaturization of bioassays, advances in recombinant toxin production, and improvements in the sensitivity of mass spectrometry and NMR spectroscopy promise to usher in a new era of venoms-based drug discovery. Not only will these advances expedite the speed at which venoms can be screened and venom peptides characterized, they will facilitate the study of venoms that are only available in small quantities, thus dramatically increasing the biodiversity that can be accessed for drug discovery.

Acknowledgments

We acknowledge financial support from the Australian Research Council (Discovery Grants DP0774245 to GFK and DP0878450 to GFK and PFA) and the Australian National Health & Medical Research Council (Project Grant 511067 to LDR, GFK, and PFA, Program Grant 569927 to RJL and PFA, Senior Research Fellowship to RJL, and Australian Biomedical Training Fellowship to IV).

Copyright information

© Springer-Verlag 2010