Background

Three kinds of molecular motor proteins have been identified so far: myosins, kinesins, and dyneins [1, 2]. While kinesins and dyneins move on microtubule tracks, myosins are the only motors that use the energy of ATP hydrolysis to power movement along actin filaments. The first myosin was identified in skeletal muscle tissue, and subsequently a large number of proteins containing the myosin motor domain have been discovered in eukaryotic cells where they fulfill a variety of cellular functions from cell division, cellular locomotion, vesicle transport to muscle contraction [1].

Myosin proteins are typically divided into three major domains. The motor domain that is usually found at the N-terminus contains the nucleotide and actin binding sites. The neck domain, following the motor domain, consists of a helical segment that binds specific myosin light chains or calmodulin. The target sequence for light chain binding is based on the consensus sequence IQxxxRGxxxR [3] and therefore termed the IQ motif. Myosins may have zero to over 15 IQ motifs in the neck region. The third domain is called the tail domain and contains class specific functional motifs that are responsible for the accomplishment of the different tasks of the myosins in the cell [1]. The classification of the myosins is based on the phylogenetic relation of the myosin motor domains. Altogether, up to 20 myosin classes have been assigned in recent reviews [4, 5]. In addition, there are many myosins for which close homologs have not been found and which are therefore termed orphan myosins. Recently, two analyses of myosin proteins describing conflicting findings have been published [6, 7]. Both disagree with previously established models of myosin evolution [reviewed in [5]], because of the erroneous data sets and analysis methods used. However, we have performed an exhaustive analysis of 1910 manually annotated myosins from 303 species that will be referred to in the analysis of the Dictyostelium discoideum myosins (F. Odronitz and M. Kollmar, submitted).

Different organisms contain only a subset of the classes but in many cases several homologs of the same class. For example, the Entamoeba histolytica genome reveals one class-I and one class-II myosin [8], while the Saccharomyces cerevisiae genome contains two class-I, one class-II, and two class-V myosins [9]. In contrast, more complex genomes show more diversity. The Caenorhabditis elegans genome contains 17 myosins belonging to seven classes, the Drosophila melanogaster genome contains 13 myosins belonging to ten classes, and the Homo sapiens genome reveals 40 myosins belonging to twelve classes. Plants do not have a large repertoire of classes but many homologs of the class-VIII and class-XI myosins (e.g. Arabidopsis thaliana contains 4 class-VIII and 13 class-XI myosin genes).

For the lower eukaryot Dictyostelium discoideum, 14 myosins or potential myosins have been reported so far. The first myosin identified, a class-II myosin, has been named mhcA (myosin heavy chain A) and all subsequently discovered myosins where named alphabetical (MyoA-M). For MyoF and MyoH only small fragments of the motor domain have been obtained while for MyoG and MyoL only potential loci have been identified. The class-I myosins (MyoA-E, MyoK) and the class-II myosin have unambiguously been assigned in the past [e.g. [5, 10]]. MyoI has been classified as a class-VII myosin [5, 11], although the phylogenetic grouping has been very weak, while no similar myosin has been found to MyoM that has therefore been added to the orphan class. The closest homologues found for MyoJ came from the class-XI myosins, a class only containing plant myosins, and it was therefore grouped to them [12, 13].

Almost all myosins exist at least as heterodimers by binding of light chains to the neck region. The light chains bind to the IQ motif and have an essential role in stabilizing the neck, so it can function as a rigid lever arm that swings relative to the motor domain to generate movement [14]. In addition, the light chains can be important regulatory sites, either through phosphorylation or by binding of Ca2+ [15]. The light chains belong to the family of calmodulin-related proteins. Most myosins are expected to bind calmodulins, while the class-II myosins always bind two specific calmodulin-related proteins, the essential and the regulatory myosin light chains. Recently, calmodulin-related proteins have been identified that specifically bind to a certain myosin, e.g. a light chain binding to Toxoplasma gondii Myo14A [TgMLC-1, [16]], or the light chain binding to Accanthamoeba castellanii Myo1C [AcMICLC, [17]]. For Dictyostelium discoideum, in addition to the essential and the regulatory myosin light chains two further specific light chains have been discovered. The class-I myosin MyoD binds a light chains that is phylogenetically related the Ac Myo1C light chain [MlcD, [18]], and the class-I myosin MyoB forms a heterodimer with a light chain that is unique as it consists of only the half of a calmodulin-related protein [MlcB, [19]].

Dictyostelium is one of the most famous model organisms for studying motile processes in cells, especially those related to the actin cytoskeleton. Recently, the question has been raised how many myosins a simple multicellular organism like Dictyostelium would need to accomplish all the diverse motility-related tasks [10]. Here, the complete repertoire of myosin family proteins in the slime mold Dictyostelium discoideum is presented. The analysis revealed thirteen myosin proteins of which three have not been described so far. The new myosin family members are described and the already published myosins revised and partially reclassified. In addition, all members of the calmodulin-related protein family in Dictyostelium have been identified and analyzed to reveal the complete repertoire of myosin light chains.

Results and discussion

Identification of Dictyostelium discoideum myosins

The TBLASTN search with the motor domain of the Dictyostelium discoideum class-II myosin against the Dictyostelium genome sequence retrieved the previously identified and described genes and three new myosin genes (Table 1). Small fragments of two of these myosins (Myo1F and Myo5A, former MyoH) have already been obtained in an investigation that combined low-stringency hybridization, physical mapping techniques, and PCR [20], but have not been verified in later studies. The study also revealed two additional loci that were referred to as myoG and myoL. The analysis of the Dictyostelium genome now showed that the myoG locus is a real locus, and the corresponding new myosin protein has been named MyoG. However, there is no evidence for further myosin genes and the myoL locus has most probably been assigned based on experimental artefacts.

Table 1 Members of the myosin gene superfamily in D. discoideum.

The Dictyostelium cDNA database in Japan [21] was searched for the new myosin genes to confirm their expression (exclusion of pseudogenes) and gene structure, as the newly identified myoF, myoH, and myoG genes contain several introns (Table 1). However, the cDNA clones only cover the region around the last of the introns of Myo5A (MyoH). The extremely high AT content of Dictyostelium introns and the help of a multiple sequence alignment of over 1700 myosin motor domains (M. Kollmar, unpublished data) nevertheless allowed the unambiguous identification of the introns and, subsequently, the protein coding regions.

The Dictyostelium cDNA database also contains at least cDNA fragments for all previously reported myosins. The analysis of the genome sequence and the cDNA data revealed several major discrepancies to the published sequences (Table 2) in addition to many amino acid substitutions. The sequences derived from the genome-sequencing project are without much doubt the correct sequences, because the genome sequence was build on high coverage and is completely in accordance with the cDNA data. Also, the sequences derived from the genome data are in agreement with the multiple sequence alignment while the published old sequences create strange insertions and substitutions. It is very unlikely that the differences are due to strain differences because the AX4 strain, that has been used to create the genome sequence and cDNA libraries, has been derived from the AX2 and AX3 strains used in earlier publications [22].

Table 2 Differences in the sequences of previously published genes and the genes obtained from the genome sequencing project.

Expression pattern of the new myosin genes

The cDNA database is based on libraries of cells obtained from several different stages of the developmental cycle of Dictyostelium [23]. One library contains data from clones obtained at the so-called first finger stage (14 h – 16 h) of development. A second library was constructed from vegetatively growing cells. A third library contains clones derived from "sexually competent cells" (cells are cultured in liquid medium in the dark, and are competent for fusion with opposite mating type cells), cells that are roughly equivalent to growth phase cells. The data supposed to contain full-length genes is based on libraries from cells at the following stages of the developmental cycle: axenically growing cells, cells developed on nitrocellulose filter to aggregation stage (8 hours), slug stage (16 hours) and early culmination stage (20 hours). For Myo5A (MyoH) and MyoG cDNA fragments have been obtained in all libraries containing the potentially full-length genes indicating that these two myosins are expressed in all stages of the developmental cycle. However, only one gene fragment has been obtained for Myo1F from the library of full-length genes of axenically growing cells. The cDNA data is not supposed to reveal the complete expression pattern of all proteins at the different developmental stages. But the number of obtained clones and the occurrence in a specific library indicates that Myo1F might not be strongly expressed, and primarily expressed in vegetative cells. These conclusions have of course to be confirmed by further experimental data.

Classification, nomenclature, and phylogenetic analysis

The classification and suggested revised nomenclature of the Dictyostelium myosins is summarized in Table 1 and shown in Figure 1. The general nomenclature for myosin proteins uses the term Myo followed by the class number (Arabic numeral) and the variant (Arabic letter). The class-II myosins are exceptions as Mhc or Myh is used as abbreviation (leaving out the class number) followed by the variant designations as either Arabic letters or Arabic numerals. To not severely increase the number of classes, myosins that do not have a homolog in at least one other organism should be referred to as orphans. So far, the Dictyostelium myosins have not consistently been named according to that nomenclature. The phylogenetic analysis of over 1700 myosin motor domains (M. Kollmar, unpublished data) together with the completed sequence of the Dictyostelium genome now allows a revision. The new nomenclature does not severely change the old names as the class-II myosin stays untouched, the class-I myosins only get the class designation added with the exception of MyoK that will now be referred to as Myo1G, and MyoM is still an orphan myosin and will not be renamed as long as it cannot be grouped to a certain class. The newly identified MyoG (preliminarily named according to its locus) is also an orphan myosin and might be renamed as soon as further homologous myosins are derived. However, the classification of the other myosins, MyoH, MyoJ, and MyoI, is not that unambiguous.

Figure 1
figure 1

Schematic diagram of the domain structure of the Dictyostelium discoideum myosins. The class designation is given in the motor domain of the respective myosin in Roman numerals. Orphan myosins have been designated n.c. (not classified). A colour key to the domain names and symbols is given on the right except for the myosin domain that is coloured in blue.

A phylogenetic tree of 180 myosins of the classes V, VIII, and XI (including the former class-XIII myosins) does not group MyoH and MyoJ to any of the already assigned classes (Figure 2, additional file 1). Instead of assigning these myosins a new class, as it happened in the past e.g. for the classification of the Acetabularia cliftonii class-XIII myosins, I suggest to name MyoH and MyoJ Myo5A and Myo5B, respectively. Both myosins have a similar domain organisation as the class-V and the class-XI myosins. But because Dictyostelium separated from the Fungi/Metazoa lineage after the separation of the plants, MyoH and MyoJ should rather be referred to as class-V than class-XI myosins. This classification is supported by the analysis of the over 1700 myosins that revealed MyoH and MyoJ to be closer related to the class-V myosins than to the plant myosins (M. Kollmar, unpublished data).

Figure 2
figure 2

Phylogenetic tree of 180 motor domains of class-V, -VIII, and -XI myosins (including the former class-XIII myosins). Amino acid sequences of the motor domains were aligned in a structure-guided manual alignment process. Support values for each internal branch were obtained by 1,000 bootstrap steps. The values for the innermost branches are given. The scale bar corresponds to 0.1 estimated amino acid substitutions per site. See additional file 1: SuppMat1 for the complete tree containing all internal labels.

MyoI does not group to any of the designated classes containing myosins with MyTH4/FERM domains as is shown by the phylogenetic analysis of 126 myosin motor domains of classes VII, X, XV, and XXII (Figure 3, additional file 2). It does not group to the class-XXII myosins although it branches from the class-XXII myosins in the tree shown. The reason is that the branching occurs very close to the separation point of the other classes, and in phylogenetic trees including all classes MyoI also often branches very early from the class-X myosins. A close view at the protein sequence in the multiple sequence alignment shows that MyoI shares several class specific features off all four classes that prevent a better classification. In addition to the motor domain sequence, the domain organisation of MyoI is unique compared to members of the other classes (Figure 4). In contrast to class-VII myosins, MyoI does not have an N-terminal SH3-like domain, it has four instead of five IQ-motifs and it misses the first FERM domain. The tails of class-X myosins are different to that of MyoI as they are characterised by two consecutive PH (p leckstrin h omology) domains followed by the MyTH/FERM tandem domain. The domain organisation of the class-XV myosin tails is similar to that of MyoI except that the mammalian myosins have a very long N-terminal domain, while the insect myosins miss the SH3 domain. Like MyoI, the class-XXII myosins do not have an N-terminal domain, but their tail domain is different containing two complete MyTH/FERM tandem domains but no SH3 domain. Thus, MyoI cannot be grouped to any of the already designated classes and should be considered as an orphan myosin. This implicates that MyoI cannot be considered as a specific model for class-VII myosins as it has been suggested earlier [11, 24]. The Dictyosteliida diverged before the evolution of the Metazoa. MyoI therefore rather resembles a common ancestor of the four classes instead of grouping to one of them.

Figure 3
figure 3

Phylogenetic tree of 126 motor domains of class-VII, -X, -XV, and -XXII myosins. Amino acid sequences of the motor domains were aligned in a structure-guided manual alignment process. Support values for each internal branch were obtained by 1,000 bootstrap steps. The values for the innermost branches are given. The scale bar corresponds to 0.1 estimated amino acid substitutions per site. See additional file 2: SuppMat2 for the complete tree containing all internal labels.

Figure 4
figure 4

Domain organisation of examples of class-VII, -X,-XV, and -XXII myosins. The class designation is given in the motor domain of the respective myosin in Roman numerals. A colour key to the domain names and symbols is given on the bottom except for the myosin domain that is coloured in blue. For comparison, a myosin of class-XII is shown because its tail also contains MyTH4 and FERM domains, although class-XII myosins phylogenetically do not group at all to the other classes shown.

Domain structure of the known myosins

Most of the tail domains of the known myosins have already been described and functionally analysed in some detail [25, 26]. While there is agreement on the determination of the larger tail domains like the class-I myosin membrane-binding and SH3 domains, or the Myo5B (MyoJ) DIL (dil ute) domain, there are contradicting predictions of the IQ motifs. The IQ motif is a short sequence motif of alpha-helical structure that is able to bind calmodulin or calmodulin-like proteins. IQ motifs have been predicted in the past by using the general pattern IQxxxRGxxxR [3]. The multiple sequence alignment of the whole myosin family now allows a revision of the motif (M. Kollmar, unpublished data). According to the revised motif several IQ motifs are found in the Dictyostelium myosins that have not been recognised before (Figure 1 and Figure 5). The starting isoleucine in the motif is often substituted by other large hydrophobic amino acids. The glutamine at the second position is mainly conserved, except for some cases where it can be substituted by lysine, glycine, or glutamate. A very important position is the residue before the first arginine, that is almost always a large hydrophobic amino acid, in most cases an aromatic one. The first arginine is also highly conserved, except for a few cases where it is substituted by lysine, leucine, or isoleucine. The following glycine is not very conserved in IQ motifs of myosin tails. The position after that glycine is mainly occupied by large hydrophobic amino acids, in most cases aromatic residues, but histidines, asparagines, and glutamines are also found. The second arginine of the initial motif is also not very conserved. Using the revised motif, no IQ motif is predicted for Myo1G, one is predicted for Myo1B and Myo1D, and two are predicted for the other class-I myosins. Myo1C might contain a third IQ motif but then the packing of the light chains would be relatively dense. However, Myo1B binds a light chain that consists of only two EF-hand motifs [half the size of a normal calmodulin-related protein, [19]]. Two similarly small myosin light chains could easily bind to the two closely located IQ motifs of Myo1C. According to the new motif description MyoI is now predicted to contain four IQ motifs in contrast to earlier predictions of three IQ motifs [11], and Myo5B (MyoJ) contains six. The domain compositions of the newly identified myosins are described in more detail below.

Figure 5
figure 5

Alignment of the putative IQ motifs of the Dictyostelium myosins. The alignment shows the IQ motifs of the myosins. The numbers at the beginning indicate the position in the full-length sequence. The gap has only been introduced to facilitate the identification of the IQ residues.

Domain structure of the newly identified and analysed myosins

Myo1F

This myosin is the seventh class-I myosin found in the Dictyostelium genome. Based on the short fragment of the motor domain which has been obtained in an earlier PCR-based screen of the genome [27] it was already supposed to group to this subfamily. Its motor domain sequence is most similar to that of Myo1E. Unlike the other class-I myosins that contain only small N-terminal extensions to the motor domain of 8 to 15 residues it has an N-terminal domain of 50 amino acids. However, based on its sequence this small domain is unlikely to fold into a similar structure like the N-terminal domain of class-II myosins. Directly following the motor domain, Myo1F has two consecutive IQ motifs that are strong indicators for binding of calmodulin or a calmodulin-like myosin light chain. A short coiled-coil region has been predicted for the small region between the two IQ motifs. The short distance of only about 20 residues, however, makes it unlikely that Myo1F would be able to dimerize in that area. The remainder of the tail is similar to those of Myo1A or Myo1E predicted to be all α-helical. In the centre of the tail there is a domain rich in basic residues and it is therefore outlined as membrane binding domain in analogy to those of the other class-I myosins. Whether this region really binds to membranes has to be shown.

Myo5A (former MyoH)

Myo5A seems to be the smaller brother of Myo5B (MyoJ). Both are phylogenetically closely related and show similar domain organisations. Altogether, Myo5A (MyoH) is 480 aa shorter in length. It also has an N-terminal SH3-like domain, but the N-terminal extension is not as long as for Myo5B (MyoJ). In contrast to Myo5B (MyoJ), it has three additional long insertions into surface loops of the motor domain that are not observed in any other Dictyostelium myosin sequence. It has five instead of six IQ motives for binding of light chains, and the coiled-coil region is predicted to be considerably shorter than that of Myo5B (MyoJ). Except for the DIL domain at the C-terminus of the tail there is no further sequence similarity between the two Myo5 tail sequences.

MyoG

MyoG does not group to any of the class-VII, class-X, class-XV, or class-XXII myosins, or any other myosin, and is therefore designated an orphan myosin. MyoG is one of the longest myosins of the whole family. It has an N-terminal domain of 440 residues that does not show any homology do other proteins. The sequence in the N-terminal domain contains long stretches of consecutive asparagines and serines that are typical for many Dictyostelium proteins [28]. The head domain is followed by four IQ motifs. The C-terminal tail is characterised by two MyTH4/FERM tandem domains that are separated by a long region containing an SH3 domain and a short predicted coiled-coil region. The regions between these recognised domains also contain many stretches of consecutive polar residues. An outstanding case is the nine-fold consecutive repeat of the motif 'SQQQQ'. The C-terminal end of the tail contains a second predicted coiled-coil region. The coiled-coil domains in myosins are normally located directly behind the IQ motifs and are responsible for dimerisation. MyoG might also exist as a dimer in vivo, but the heads are not expected to move on actin filaments in a similar hand-over-hand mechanism as has been found for other myosin dimers like class-V myosins [29].

Structural features of the myosin motor domains

The myosins of Dictyostelium contain several protein specific extensions to surface loops of the motor domain of class-II myosin (Fig. 6). The most prominent loop-extension is the insertion of ~130 amino acids into loop-1 of Myo1G. Except for members of an arthropoda specific myosin class, that contain loop-1 extensions of up to 300 amino acids [30], this is by far the longest loop-1 of all myosins. Myo5B (MyoJ) also contains a considerably longer loop-1 (20 residues in addition to loop-1 of MhcA). Loop-1 has been implicated in influencing access to the nucleotide-binding site. It has been shown in an analysis of chimeric loop-1 mutants of smooth muscle myosin that the mobility of this loop correlates to the rate of ADP release [31]. Larger and more flexible loops resulted in faster rates of ADP release. For class-V myosins, the rate-limiting step in the catalytic cycle is ADP release [32], a necessity for these myosins to walk over long distances along the actin filaments without detaching. Based on these results, Myo5B (MyoJ) might be a very unconventional class-V myosin with a fast rate of ADP release. It will be very interesting to see whether Myo5B (MyoJ) is still a long-distance cargo transporting myosin as the other homologs of the class.

Figure 6
figure 6

Structure of the motor domain of Dictyostelium MhcA. Structure of the motor domain of Dictyostelium MhcA (PDB: 1g8x) highlighting loops for which some of the Dictyostelium myosins have long insertions. The red numbers indicate the length of the loops for MhcA. The approximate lengths of the insertions are given for the respective myosins.

Loop-2 has been shown to be involved in both weak and strong binding interactions with actin [33]. According to this study, especially positively charged residues strengthen the binding to actin. MyoG and MyoM contain long extensions of loop-2 compared to MhcA, but the sequences contain mainly glycines, prolines, and polar amino acids. Thus, both myosins are not expected to have considerably different actin-binding properties. Loop-4 is the loop that is furthest removed from the actin surface as has been suggested from actomyosin models derived from electron microscopy. Except for some class-I, the insect class-V, some class-XVII, and some apicomplexa myosins, almost all myosins have a loop-4 of similar length. The loop has been suggested to be involved in either interactions with actin or regulatory proteins that are bound to actin [34]. For the class-I myosin myr1 from rat, it has been shown that a head fragment localizes to the same highly dynamic actin structures at the cell cortex as the full-length construct and not to the actin filaments that are regulated by tropomyosin [35]. An extended loop-4 might be responsible for this localisation as it might hinder binding to the tropomyosin stabilized less dynamic actin filaments as they occur in stress fibers. Myo5A (MyoH) contains one of the longest loop-4 of all myosins and might therefore only bind to the dynamic actin structures at the cell cortex and not to actin structures that are stabilized by other proteins.

MyoM has an extended loop at the same position where the class-VI myosins have one of their prominent insertions. This loop has been suggested to affect nucleotide binding by changing the conformation of a following loop [36]. This loop therefore protrudes within the nucleotide-binding pocket, resulting in a decrease in nucleotide accessibility. The functions of the other two surface loops, for which Myo5A (MyoH) has long extensions, have not been analysed so far.

Phylogenetic analysis of calmodulin-like proteins in Dictyostelium

An iterative TBLASTN search of the Dictyostelium genome data, starting with the sequence of Dictyostelium calmodulin A (Dd CalA), revealed 35 CBPs (c alcium-b inding p rotein) that exclusively contain EF-hands motifs (Table 3). While 32 CBPs contain four EF-hands, three contain only two EF-hand motifs (MlcB, MLC-1, and CBP10). 22 CBPs have already been described in the literature. To classify the remaining 13 CBPs and to identify those that could potentially function as myosin light chains, the Dictyostelium EF-hand proteins were compared with EF-hand containing proteins from Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, and some specific myosin light chains from other organisms. The obtained 176 calmodulin-related proteins were manually aligned and a phylogenetic tree created (Figure 7, additional file 3).

Table 3 Members of the myosin light chain, calmodulin and calmodulin-related gene superfamily in D. discoideum.
Figure 7
figure 7

Phylogenetic tree of calmodulin-related proteins. Amino acid sequences of the calmodulin-related proteins were aligned manually. Support values for each internal branch were obtained by 1,000 bootstrap steps. The scale bar corresponds to 0.1 estimated amino acid substitutions per site. See additional file 3: SuppMat3 for the complete tree containing all internal labels.

According to the phylogenetic tree, 16 of the Dictyostelium CBPs (CBP1-13, calfumirin) do not belong to an already named class but are closely related to the frequenins. CBP4c is very similar to CBP4a and CBP4b, but the N-terminus could not be identified. CBP13 is a protein fragment, missing the N-terminus as well as the C-terminus, and is very similar to CBP12. CBP4c and CBP13 are therefore most probably pseudogenes. Those members of this group, that have already been identified, belong to developmentally regulated genes. Their distinct spatial expression patterns suggested that they might be involved in morphogenesis [37]. Seven of the Dictyostelium CBPs belong to the frequenin class. The members of the frequenin class are also highly developmentally regulated. An interaction with a myosin family protein has not been described in the literature for any member of this group of any organism. Dictyostelium also contains two centrins and two proteins grouping to the calcineurin family.

So far, only the ELC and RLC myosin light chains as well as calmodulin subfamily proteins have been shown to bind to the IQ motifs of myosins. The exceptions are the myosin light chain of Toxoplasma gondii Myo14A [TgMLC-1, [16]], the light chain of Accanthamoeba castellanii Myo1C [AcMIMLC, [17]], and, identified in Dictyostelium, a light chain of Myo1D that is phylogenetically related the Ac Myo1C light chain [MlcD, [18]], and a light chain that binds to Myo1B [MlcB, [19]]. Next to MlcD and MlcB, Dictyostelium contains one ELC and one RLC myosin light chain that bind to MhcA. For Saccharomyces cerevisiae it has been shown that the ELC (also termed Mlc1p) not only binds to the class-II myosin, but also to the class-V myosin Myo2p. Chicken myosin-5A even binds two different ELCs next to calmodulin. Therefore, the Dictyostelium ELC is also expected to participate in binding to the class-V myosins. Dictyostelium also contains two members of the calmodulin subfamily that have not specifically been shown to bind to myosins, but are highly expected to in accordance with results obtained for other organisms. The analysis also revealed another CBP containing two EF-hand motifs that is most closely related to MlcB. It is therefore also highly expected to bind to a myosin, and termed MLC-1. As Myo1B and Myo1D bind a specific light chain each, and Myo1C is the closest homolog of the remaining class-I myosins, MLC-1 might be the specific light chain for Myo1C, but this has of course to be proven by biochemical experiments.

CBP14 does not group to any specific class but is more similar to the members of the calmodulin/ELC/RLC part of the phylogenetic tree then to the developmentally related CBPs. If it were able to function as a myosin light chain, then it would be the founding member for another specific myosin light chain class.

Conclusion

The analysis of the Dictyostelium discoideum genome revealed thirteen members of the myosin family of which three have not been described before. The phylogenetic analysis of their motor domains placed seven myosins to the class-I myosins (Myo1A to Myo1G, Myo1F is a new member), one to the class-II myosins (MhcA), and two, of which Myo5A (MyoH) has newly been identified, to the class-V myosins. Three myosins (MyoG, MyoI, and MyoM) do not have a close homolog in any other organism and could therefore not be classified. In contrast to previous analyses, an extensive comparison with 126 class-VII, class-X, class-XV, and class-XXII myosins now showed that MyoI does not group into any of these classes and can not be used as a model for class-VII myosins. The third new myosin has been named MyoG. It contains an N-terminal extension of over 400 residues, and a tail consisting of four IQ motifs and two MyTH4/FERM tandem domains that are separated by a long region containing an SH3 domain. Although its tail organisation is similar to that of class-VII myosins, the motor domain of MyoG does not group into any existing class.

Four specific myosin light chains have been identified so far (ELC, RLC, MlcB, MlcD) next to two calmodulins. The analysis of the genome revealed another protein containing two EF-hand motifs that is closely related to MlcB. Based on its phylogenetic relationship it is highly expected to be a myosin light chain. A further calmodulin-related protein, termed CBP14, phylogenetically groups to the ELC/RLC/calmodulin branch of the tree and might therefore also be a myosin light chain, although it does not have a close homolog in other model organisms.

Methods

Identification of Dictyostelium myosins and calmodulin related proteins

The full-length sequences of ten of the thirteen Dictyostelium myosins have been reported in the literature (Table 1). Partial sequences of Myo1F (MyoF) and Myo5A (MyoH) have already been reported [27] and were used as basis for the manual assembly of the genes from clones published by the Dictyostelium Genome Sequencing Project. These genes are in consistence with the assembly of the recently published genome [28]. The two sequences, as well as MyoG that has also been derived from genomic data [38], are therefore predicted sequences that have not been verified by complete cDNAs. The Japanese cDNA project [21, 23] includes only small parts of these new myosins, which are, however, consistent with the predicted sequences. The Japanese cDNA project also includes at least fragments of all other myosins. No additional myosin genes have been found in the genome of Dictyostelium, and thus the reported "myoL" gene locus [27] might have originated from experimental artefacts.

The Dictyostelium calmodulin-related genes have been identified in an iterated TBLASTN search of the completed Dictyostelium genome starting with the protein sequence of CalA. Thus, all solely EF-hand motif-containing proteins have been collected (Table 3). The predicted sequences have been verified by searches against the Japanese cDNA database [23]. The EF-hand containing proteins from Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens have been obtained in iterative TBLASTN searches of the corresponding genomes. Specific myosin light chains from other organisms have been obtained from the protein database at NCBI.

Building trees

The complete analysis of the myosin motor domains derived from the NCBI non-redundant database and the EST and genomic sequences of more than 270 eukaryotes will be published elsewhere (Kollmar, unpublished data). All these myosin sequences, including the Dictyostelium myosins, together with their accession numbers and additional references will be accessible through the newly designed CyMoBase [[39], F. Odronitz and M. Kollmar, submitted]. The database comprises over 1700 myosin sequences (Feb. 2006) that have been used for the phylogenetic classification of the Dictyostelium myosins. The underlying phylogenetic tree has been built of a structure-guided multiple sequence alignment. The phylogenetic trees of the class-V/-VIII/-XI and the class-VII/-X/-XV/-XXII myosins have been constructed from corresponding sequences of this alignment. All phylogenetic trees are unrooted and were generated using neighbour joining and the Bootstrap (1,000 replicates) method as implemented in ClustalW [standard settings, [40]] and drawn by using TreeView [41].

The phylogenetic tree of the calmodulin-related proteins has been calculated based on a manual sequence alignment. The manual sequence alignment has been improved by iteratively creating phylogenetic trees and adjusting the alignment. The resulting phylogenetic tree is unrooted and was generated using neighbour joining and the Bootstrap (1,000 replicates) method as implemented in ClustalW and drawn by using TreeView.

Domain and motif prediction

Protein domains were predicted using the SMART [42, 43] and Pfam [44, 45] web server. The prediction of protein motifs (coiled coils, leucine zipper, prenyl-group binding motifs) is mainly based on the results of the predict-protein server [46, 47]. The IQ-motifs and N-terminal domains were predicted manually based on the homology to similar domains of other myosins included in the multiple sequence alignment of the myosins. The recognition motifs included in the SMART and Pfam databases are too restrictive, as the motifs have been created based on the small datasets available some years ago.