Gene organization and evolutionary history

This review focuses on the eukaryotic calpains, although genome databases reveal bacteria, but no archaea, with sequences related to the catalytic core domains (domains dI and dII) of the classical calpains, the criterion used for designating a protein as a calpain. Only single copies of calpain-coding genes are found in the small number of sequenced or partially sequenced protozoan genomes, such as those of the apicomplexan parasites Plasmodium falciparum, Theileria annulata and Cryptosporidium parvum [13], and of the amitochondrial parasite Entamoeba histolytica [4]. No calpain-like sequences were identified in the human pathogen Giardia lamblia, a diplomonad often considered to be the most basal eukaryotic organism [5]. Protozoan calpains lack a domain containing EF-hand-type Ca2+-binding sites, as also do plant and fungal calpains, and thus it seems likely that the proposed cysteine protease-calmodulin gene fusion leading to the classical calpain structure (for earlier reviews see [68]) occurred exclusively within the animal lineage. The nomenclature recommended for describing calpain proteins and the genes encoding them is summarized in Box 1.

figure 4

Box 1

Uniquely within protozoa, the kinetoplastid parasites Trypanosoma brucei, T. cruzi and species of Leishmania, and the ciliate Tetrahymena thermophila [912] display expansion of calpain genes. Fourteen genes encoding calpain-related proteins have been identified in T. brucei, 17 in Leishmania major and 15 in T. cruzi [13]. Most of these capn genes are organized as tandem repeats in a small number of gene clusters that are syntenic between T. brucei, T. cruzi and L. major, indicating that most of the observed expansion and diversity was probably generated by gene-duplication events in an ancestral kinetoplastid. The macronuclear genome sequence of the ciliate T. thermophila [12] predicts a surprisingly large number of 26 calpain-like proteins. Analysis of human and mouse genomes has identified 14 members of the calpain family. For the few calpain genes analyzed in mammals, sizes range from 13 to 50 kb with 15 to 28 exons [7]. Phylogenetic trees have been generated for isolated domains [8, 14] and for the defining catalytic core domain (dI-dII) in conjunction with the most common, C2-like, auxiliary domain (dIII), of selected species [14, 15]. An analysis by Jekely and Friedrich [14] revealed clear segregation of the EF-hand-containing capn gene (Schistosoma, Caenorhabditis elegans CLP-1, Drosophila A/B and the classic vertebrate capn) from the cluster containing capn5(tra3) and capn6 [14]. Possible gene-duplication events may explain the closer evolutionary relationships between the pairs capn2 and capn8, capn3 and capn9, and capn1 and chicken μ/m [14]. Wang et al. [15] also included capn11 and 12 in their phylogenetic analysis, but neither report included capn10. Of interest would be a more detailed analysis of domain dIII sequences in these genes, to determine whether there is a general functional homology between dIII domains that are related to the calcium- and phospholipid-binding domain of protein kinase C (C2-like domains), as is the case in mammalian calpains 1, 2 and 3 [7] and Drosophila calpain B [16].

A phylogenetic tree rooted to the calpain-related sequence of the prokaryote Porphyromonas gingivalis and based only on the catalytic core (dI-dII) is shown in Figure 1, and suggests that the EF-hand-containing calpains from animals (carboxy-terminal EF-hands) and Tetrahymena (amino-terminal EF-hands) are phylogenetically well separated. This raises the intriguing possibility that the acquisition of EF-hands occurred through independent gene-fusion events in these groups. The phylogenetic analysis also reveals a close relationship of the Tetrahymena calpain containing 21 transmembrane motifs (21TM) with plant calpain (Arabidopsis DEK1), thus raising the possibility of a common origin for these unusual calpains. Lateral gene transfer from a green alga-type endosymbiont of ciliates is one possible mechanism.

Figure 1
figure 1

The phylogenetic relationship of calpains from diverse evolutionary groups of eukaryotes. Only the catalytic core domains (dI-II) were used to construct the tree. Multiple alignments were done with Clustal X and bootstrapped with PAUP4* (1,000 iterations). Only values greater than 50% are indicated. The tree was rooted with the calpain-related sequence from the prokaryote Porphyromonas gingivalis. A minus sign (-) indicates a nonstandard catalytic triad; species names in bold contain EF-hand motifs and the amino- or carboxy-terminal location of the motif is indicated by superscript N or C. Gray box, representative examples of classical calpains; yellow box, calpains containing a carboxy-terminal SOL domain; magenta box, calpains containing an additional carboxy-terminal C2 domain (also referred to as a Tra3 or T domain); green box, calpains containing 21 amino-terminal transmembrane domains; blue box, calpains containing a carboxy-terminal PalB-type domain. Species names: T. brucei, Trypanosoma brucei; T. thermophila, Tetrahymena thermophila; S. histriomuscorum, Sterkiella histriomuscorum (a ciliate); E. histolytica, Entamoeba histolytica; D. melanogaster, Drosophila melanogaster; S. mansoni, Schistosoma mansoni; C. elegans, Caenorhabditis elegans; H. sapiens, Homo sapiens; A. thaliana, Arabidopsis thaliana; A. gambiae, Anopheles gambiae; C. albicans, Candida albicans; S. cerevisiae, Saccharomyces cerevisiae; P. falciparum, Plasmodium falciparum; C. parvum, Cryptosporidium parvum; P. gingivalis, Porphyromonas gingivalis. Calpains listed with unpublished, nonstandard abbreviations: 3TM, three carboxy-terminal transmembrane domains; 5EF, five-EF-hand motifs; 21TM, 21 amino-terminal transmembrane domains; DI-II, domains dI-dII-only calpain without further recognizable motifs. Single calpains have been identified in organisms where only species names are given. Sequences and accession numbers are available in Additional data file 1.

Characteristic structural features

Calpains have a highly modular organization, as illustrated in Figure 2, which shows the types of protein modules and their organization within specific calpains. The catalytic subunit of classical calpains has four domains, of which dI and dII constitute the catalytic core, dIII is a C2-like domain capable of calcium and phospholipid binding, and dIV contains five EF-hand motifs, the fifth serving in some calpains as a dimerization motif for binding to a 'small subunit' (see below) or to form homodimers. The non-classical calpains all have domains dI and dII (by definition), but not all have dIII or dIV, and some contain other types of modules (Figure 2). Although defined by their 'catalytic' core sequence, an increasing number of calpains lack one or more of the essential catalytic amino-acid residues, suggesting functions unrelated to proteolysis. It has been speculated that these 'pseudo-proteases' are involved in regulatory processes [13, 17]. A very recent report describes a role for the non-catalytic calpain-6 in the stabilization of microtubules [18].

Figure 2
figure 2

A modular architecture is found in all members of the calpain protein family. All the identified human calpain genes (hCAPN) are depicted with selected examples from other species. The presence of domains dI and dII is used to define the family. Domain dIII is defined as the classical calpain C2-like domain; other C2 domains can also be present (see hCAPN5 and 6). Domain dIV is the penta-EF-hand module shared by classical calpains and their small subunit Cpns-1 (where the penta-EF-hand module is known as domain dVI). Domain dV, specific to the small subunit Cpns-1 and without known motifs, is not shown here. The black bars linking modules represent sequences without known motifs and are unique to individual calpains. *The classical calpain hCAPN3 has two insertions, indicated by Δ here. These proteins have lost key catalytic residues and are predicted to lack protease activity. Species: Dm, Drosophila melanogaster; Ce, Caenorhabiditis elegans; En, Emericella (Aspergillus) nidulans; Sc, Saccharomyces cerevisiae; Tt, Tetrahymena thermophila; Tb, Trypanosoma brucei. Domain abbreviations: C2, protein kinase C conserved region 2 (domain involved in calcium-dependent phospholipid binding); IVdEF, domain dIV with degenerate EF-hand motifs that are unlikely to bind calcium; EF, domain with EF-hand motifs distinct from domain dIV; KAC, kinetoplastid acylated domain (myristic acid and palmitic acid chains are indicated by zigzag lines); MIT, microtubule interacting and trafficking molecule domain; palB, palB-homologous domain; PKA, protein kinase A regulatory subunit domain; SOL, small optic lobe domain; TMD, transmembrane domain; Zn, zinc finger domain. The functions of some of these protein modules are not yet defined. The domain structures were assembled using SMART [79] and the peptidase database MEROPS [80].

Some of the classical calpains are heterodimers of the 'large' catalytic subunit with the so-called small subunit Cpns-1 (formerly known as Capn-4). Cpns-1 is composed of two domains: dV, an amino-terminal glycine-rich unstructured domain, and dVI, a penta-EF-hand module homologous with dIV of the catalytic subunit. Domain dVI was the first calpain module for which structures were solved in the absence and presence of calcium (reviewed in [7]). These structures provided crucial insight into the nature of heterodimer formation in the classical calpains, anticipated the small contribution of this domain to the calcium-induced conformational change of the holoenzyme, and later revealed details of the interaction of the Cpns-1 protein with a peptide mimicking calpastatin, the endogenous and specific inhibitor of the classic calpains 1 and 2 [19] (Figure 3a).

Figure 3
figure 3

Structures of calpain modules and calpain-2. (a) Ribbon diagram of the structure of the penta-EF-hand module (domain dVI) of Cpns-1 from pig. It is shown here as a homodimer (one chain green, one cyan). The short helical peptides (yellow and magenta) are 19-residue mimics of the conserved C peptide of the calpain inhibitor calpastatin bound to dVI in the presence of calcium (orange spheres). The structure is from PDB 1NX1 (Todd et al. [19]). (b) Ribbon diagram of the structure of the rat calpain-2 heterodimer. The catalytic core domains (dI-dII) are in light and dark blue, respectively. Catalytic residues are shown as magenta sticks (with the engineered mutation of C105S) and the arrow designates the active-site cleft between domains dI and dII. Domain dIII (brown) is C2-like. The penta-EF-hand domain dIV of the large subunit (Capn-2) is in yellow, and the similar domain dVI of the small subunit (Cpns-1) is in orange. Domain dV, the amino-terminal glycine-rich region of the small subunit, was truncated by protein engineering; in the human enzyme it is highly flexible and structurally unresolved [21]. The amino-terminal helix and linker loops are in green. The structure is from PDB 1DF0 (Hosfield et al. [20]). The dVI heterodimer in (a) is very similar to that formed between the dIV and dVI domains, and can be used to model this interaction. (c) Ribbon diagram of the structure of the calcium-bound catalytic core (domains dI-dII) of rat calpain-1 based on PDB 1TL9 (Moldoveanu et al. [26]). The bound inhibitor leupeptin is shown as gold, blue and red spheres; the magenta spheres are two calcium ions bound to hitherto unknown sites. All ribbon diagrams were generated using PyMol (DeLano Scientific, Palo Alto, CA, USA).

The determination of the calcium-free structure of calpain-2 from rat and human [20, 21] was key to furthering our understanding of the classic calpains (Figure 3b) and revealed unanticipated insights. In contrast to most allo-sterically regulated enzymes, where activation relieves steric hindrance at a pre-formed active site, classic calpains require a conformational change to realign the key residues (Cys, His, Asn) to make them catalytically competent. In addition, domain dIII in calpain-2 shows some structural resemblance to C2 domains, which suggests possible additional mechanisms for binding calcium and phospholipids. Mutagenesis experiments provide evidence for the function of dIII as an electrostatic switch contributing to the maintenance of the catalytic core in an inactive form and the subsequent stabilization of the active enzyme [22, 23]. The structure of calpain-2 also provided a platform for modeling the structures of calpain-1 (since confirmed by crystallization and structure determination of a chimeric calpain-1-like enzyme [24]) and of calpain-3 [25].

The isolated catalytic core of calpain-1 (the dI-dII module, referred to as 'mini-calpain') yielded a calcium-bound structure [26] (Figure 3c). Quite surprisingly, in some calpains, for example rat calpain-1, the isolated core showed weak but measurable Ca2+-dependent proteolytic activity, a result of unpredicted and novel calcium-binding sites [26, 27]. Comparisons between chimeric enzymes (mixtures of domains from calpains 1 and 2 or 3), the inactive hetero-dimer, and mini-calpains indicate some details of the mechanism of regulation of catalytic function by calcium. Activation of the enzyme core within the heterodimer involves proteolytic removal of the amino-terminal 'anchor' helix (see Figure 3b) or the release of its binding to dVI, weakening of the electrostatic interactions between dIII and dII, and the binding of multiple calcium ions to the EF-hand modules (dIV and dVI) and to dIII, which trigger changes that permit binding of calcium to the calcium-binding sites of the catalytic core. The weakening of the constraints that maintain the dI-dII domains in their 'inactive' positions and the cooperative Ca2+ binding to the core allow the realignment of the core into its active state, in which it bears a substantial structural resemblance to papain. The isolated core also provides a useful reagent for screening calpain inhibitors to find potential drug candidates [2628].

Localization and function

Calpain function has been investigated by both genetic and cell-biological routes. Table 1 summarizes these studies and their results. The targeted deletion, and more recently the conditional deletion, of the cpns1 gene [7, 29] showed that at least one classical calpain is essential for early embryogenesis in mammals. Targeted deletion studies have since shown that capn1 is not essential [30] whereas capn2 is [31]; the function of calpains in development is not yet known, however. Loss of capn9 in NIH3T3 cells results in a more transformed phenotype, as shown by increased growth in soft agar [32], but to our knowledge this gene has not yet been targeted in whole organisms. Multiple genetic defects that truncate, or otherwise inactivate, calpain-3 (also called p94) seem to be a cause of limb-girdle muscular dystrophy type IIa [25, 33], thus identifying the importance of calpain-3 in skeletal muscle integrity. Targeted deletion of capn3 in mice produces a model for assessing its role in muscle function and repair [33, 34]. Specific splice variants of capn3 occur in the lens of the eye and are linked to the formation of cataracts [35]. One factor contributing to increased susceptibility to type 2 diabetes, a multifactorial disease, may be variations in the capn10 gene. This idea still sparks controversy, as the initial observation identified a polymorphism in a capn10 intron in populations with increased risk for diabetes [7, 36, 37]. However, studies show that calpain-10 may function in stimulated secretion and/or pancreatic cell death [38, 39], and thereby be relevant to this disease. Two non-classical calpains, Tra3 and PalB (orthologs of calpain-5, capn5, and calpain-7, capn7), mediate signal transduction pathways for sex determination in nematodes [40] and adaptation to pH in yeast [41], respectively.

Table 1 The physiological functions of calpains as revealed by genetics

Biochemical and cell-biological studies also provide significant insights into calpain physiology. It is often speculated that calpains function, or become activated, when associated with membranes, despite their predominantly cytoplasmic localization [6, 7]. Although membrane binding is not well substantiated for classical calpains, predicted transmembrane segments in phytocalpain and some ciliate calpains suggest an evolutionary link between calpain function and membranes. At least two acylated calpain-like proteins in the kinetoplastids L. major and T. brucei are biochemically associated or co-localize with cellular membranes ([42] and KE, unpublished work). Acylated proteins are often associated with the cytoplasmic face of membranes and lipid rafts, where they are implicated in signal transduction [42, 43]. Thus, the small amount of calpain fractionating biochemically with membranes may be the active, physiologically relevant, enzyme population, although suggestions that vertebrate calpains localize to lipid rafts or caveolae require further confirmation. Biophysical studies demonstrate the ability of a conserved peptide (GTAMRILGGVI) located in the amino-terminal domain dV to form a membrane-penetrating α-helical structure [44], providing one mechanism for calpains 1 and 2 to bind to membranes. For many calpains, the C2-like domain (dIII) provides an additional or alternative mechanism for membrane association via its phospholipid-binding properties. A recent study has demonstrated the importance of dIII-mediated membrane binding of calpain-2 in living cells [45]. In addition, the critical self-sealing repair of damaged plasma membranes requires the activity of ubiquitous calpains, which may act to remodel the underlying cortical cytoskeleton [46].

In contrast to relatively promiscuous degradative proteases, calpains cleave only a restricted set of protein substrates and use complex substrate-recognition mechanisms, involving primary and secondary structural features of target proteins. Proteins identified as substrates for calpains include numerous membrane-bound or membrane-associated proteins, such as calcium-ATPase, the epidermal growth factor (EGF) receptor, the ryanodine receptor, the calcium receptor, the NMDA receptor (a glutamic acid receptor), β-integrins, aquaporin, the transporters ABC-A1 and GLUT4, and proteins interfacing with receptors and the cytoskeleton, such as talin, α-spectrin (α-fodrin) and ezrin, among many others (see Table 11 in [7] for a more extensive, though still incomplete, list).

A wide variety of receptors function upstream of the intra-cellular activation of calpains (Table 2). The most thoroughly studied models focus on the roles of calpains in cell motility in response to either EGF [47] or integrin engagement [48]. Additional work links calpains to cell transformation and oncogenesis [49, 50]. Knockdown strategies utilizing anti-sense RNAs or small interfering RNAs to study the roles of calpains in cell transformation and in other cellular processes have provided significant evidence for non-redundant, distinctive functions for each ubiquitous calpain isoform. Although less widely studied, there is also increasing evidence for the externalization of calpains and their extracellular contribution to tissue damage in response to toxicants or other factors [5153]. These destructive roles may relate to the documented involvement of calpains in pathways that trigger apoptosis and/or necrosis [5459] and, discovered most recently, autophagy [60, 61]. Thus, there is considerable evidence for a complex relationship between calpain activity and the functions of both caspases and the proteasome.

Table 2 Functional diversity of calpains

Frontiers

Despite great advances in our knowledge of calpains 1 and 2, much is yet to be learned about the evolution of the family and the range of functions of its members. Genomic sequences from a wide range of organisms document the extreme diversity and modular nature of the calpain protein family. Current evidence suggests that the acquisition of the penta-EF-hand module, characteristic of the classical calpains, may be restricted to animals, but that EF-hands may have been acquired independently in Tetrahymena calpains. The use of different strategies for associating with membranes, such as transmembrane domains, C2-like domains, and acylation, support the importance of membrane association in calpain function. More genomic information from representative organisms, particularly protozoa, is required to better analyze the evolutionary relationships within this family. The proteolytic core module is now relatively well characterized as to structure and function. Distinguishing the overlapping or unique substrate specificities [62] and inhibitor sensitivities of the proteolytically active calpain isoforms is expected to aid the design of studies aimed at determining their roles in cellular pathways. For the family members lacking key catalytic residues, alternative functions await discovery. Future work is also needed to determine how the modules associated with the core influence its function. There is likely to be interplay between protein-protein interactions, membrane binding, calcium binding (in many calpains) and, potentially, post-translational modifications in the modulation of calpain function. Many calpain proteins remain to be purified and characterized biochemically, so the challenge of identifying their relevant binding partners remains.

It is now established that some calpains are components of regulatory networks involved in fundamental processes at cellular (for example, motility) and organismal (for example, embryogenesis) levels. Further work will determine if and when specific isoforms and the multitude of their possible splice variants are expressed in either a tissue-specific or time-dependent manner in cells. Understanding the function(s) of individual isoforms in a variety of physiological contexts - from protozoa to humans - remains the ultimate challenge. RNA interference will continue to make a significant contribution to these goals, and the design of calpain-resistant substrates [63, 64] will provide a way of documenting calpain-catalyzed, limited proteolysis in vivo. The future development of biosensors to visualize calpain activity (or activation), like those generated for other signal pathway molecules [65], may also provide a major advance. Efforts to develop cellular calpain 'reporter' substrates have been described [66, 67] and the tight binding of calpastatin to active calpains 1 and 2 [6, 7, 68] may be exploited to develop reporters that selectively recognize the active conformation of these enzymes (D.E.C. and L.M. Vanhooser, unpublished work). More data and new approaches are needed to enhance understanding of the regulation of both proteolytic and non-proteolytic calpains. Careful transcriptional, translational and activity-based profiling - ideally able to detect the variety of splice variants - will be required to establish detailed expression patterns for calpains in relation to embryogenesis, differentiation or other cellular processes. The time is ripe to define the regulatory circuits in which calpains participate, to complete the assessment of their in vivo substrates and to characterize the regulators of all functions of calpains.

Additional data files

Additional data file 1 contains the sequences and accession numbers of the calpain sequences in the phylogenetic tree in Figure 1.