Background

Some mechanisms involved in cell morphogenesis, such as membrane vesicle transport, are conserved at least among crown eukaryotes (metazoa, fungi and plants) [1,2], whereas others, such as those involving extracellular structures or the precise roles of different Rho-like GTPases [3], are not. Yet other cellular processes, such as cytokinesis, often recruit conserved proteins to accomplish superficially dissimilar tasks (for example, budding, cleavage or phragmoplast-based cell division of plant cells) [4]. For many morphogenetic mechanisms, the question of evolutionary conservation remains unresolved because available information is limited to one or a few model organisms. For example, this is the case for the molecular mechanisms that ensure the communication between the cytoskeleton and the surface of the cell. However, the recent increase in the data available from a number of genome projects allows wide-ranging searches for homologs of known components of signaling and morphogenetic pathways. The results of such searches can lead both to experimentally testable hypotheses and to general conclusions regarding the evolution of morphogenetic processes.

Formins, also known as formin homology (FH) proteins, are proteins implicated in cellular and organismal morphogenesis of both metazoa and fungi. On the cellular level, they are involved in the establishment and maintenance of cell and/or tissue polarity [5,6], in cytokinesis [4] and in the positioning of the mitotic spindle [7]. They interact directly or indirectly with actin, profilin, Rho-like GTPases [5,6,8,9,10,11], the yeast Spa2 protein and septins [12,13], proteins containing SH3 or WW domains [10,14], dynein and microtubules [7,15,16,17]. The yeast formin homolog encoded by BNI1 is localized to the cell periphery and participates in positioning cortical actin patches towards distinct regions of the plasma membrane [5,13,18]. Some kind of contact with the plasmalemma (in addition to that mediated by a Rho-like GTPase) might therefore be expected, although there is no evidence as yet for such a contact. Furthermore, metazoan formins are believed to be cytoplasmic or nuclear proteins [19,20].

Nothing is known about formin function in plants, although the existence of two Arabidopsis thaliana proteins containing the conserved formin-homology 2 (FH2) domain has been reported recently [6,10]. Given that all known formins represent a well-defined family, this class of proteins may be a good candidate for a systematic genome sequence search. Here, I present the results of such an approach, which has led to the identification of putative plant formin genes, as well as to the finding that the evolutionarily old formin domain may be used in a number of different ways and contexts ('modules' as defined by Hartwell et al. [21]) by recent eukaryotes.

Results and discussion

Formins are defined by the presence of two sequence domains-the low-complexity, proline-rich FH1 and the carboxy-terminal FH2 [6,10,22]. A third domain-the amino-terminal FH3 motif-has been characterized biochemically but is rather poorly delimited in sequence terms [23]. Despite a conflicting consensus definition, this motif appears to be identical to the amino-terminal conserved block found in some formins by Wasserman [10]. I have used the L-x-x-G-N-x-M-N (single-letter amino-acid notation; x is any amino acid) motif present in the FH2 domain of most fungal and metazoan formins[10] to search for putative Arabidopsis formin homologs and found eight such inter-related genes (see Materials and methods and Table1). All of them correspond either to hypothetical open reading frames (ORFs) or to unannotated genomic or cDNA clones, indicating that at least some of them are expressed in vivo. These putative genes and their predicted protein products will be referred to henceforth as AtFORMINs 1 to 8.

Table 1 Putative formin-related genes of Arabidopsis thaliana

Sequence comparison with known formins revealed the presence of genuine FH2 domain in all Arabidopsis formins (Figure 1). However, even the longest predicted proteins, encoded by the AtFORMIN 3, -4 and -5 genes, lack parts of the FH2 region ubiquitously conserved among corresponding genes of fungi and metazoa (Figures 1 and 2), although not necessarily among their protein products, because some formin mRNAs undergo complex splicing [24]. Sequence motifs corresponding to the missing regions were found in all cases within the predicted introns by visual inspection of three-frame translation data. Because the reliability of mRNA structure prediction is limited [25], failure to identify exons correctly may explain the apparent deletion of this region of the FH2 domain. The possibly mispredicted intron encoding subdomain g of AtFORMIN4 is split by a frameshift mutation, however. Although this could reflect a sequencing error, the possibility remains that plant formin homologs have a modular structure within the FH2 domain at the gene level, and that at least some of the FH2-related sequences within predicted introns are vestiges of exons lost by mutation.

Figure 1
figure 1

Alignment of the FH2 domain of selected formins and definition of the subdomain modules. Subdomain modules (a-j) are marked in color. Red dots denote the position of introns (not shown in MFORMIN, for which only mRNA sequence is available). The consensus line shows 80% consensus of the EMBL DS39866 alignment. Numbers indicate positions within the sequence and the size of unaligned insertions; residues corresponding to unambiguous consensus and/or shared by all Arabidopsis formins are highlighted. For gene terminology see Table 1 and Materials and methods.

Figure 2
figure 2

Domain structure of Arabidopsis and selected yeast and animal formins. Letters denote subdomain modules with in FH2 as defined in Figure 1. Only the 'highly likely' membrane-spanning segments are shown.

Proline-rich regions corresponding to FH1 were identified in all Arabidopsis formins. Surprisingly, there are two such regions in AtFORMINs 2, 6 and 8-a feature not observed in the non-plant formins examined (listed in Materials and methods). Neither motifs corresponding to FH3 nor coiled-coil regions flanking FH1 (common but not ubiquitous in non-plant formins [10]) were found. The structure of FH2, the overall protein size (smaller than most non-plant formins) and the domain layout of Arabidopsis formins therefore show possible plant-specific features (Figure 2). This idea is supported by the topology of an evolutionary tree that consistently places Arabidopsis formins in a branch separate from other members of the formin family (Figure 3).

Figure 3
figure 3

Unrooted evolutionary tree of FH2 subdomains a, c and h constructed by the neighbor-joining method. Numbers at nodes indicate bootstrap values. Branches in agreement with the tree previously reported by Zeller et al. [6] are highlighted in green, novel branches in yellow.

As in the non-plant formins, the amino-terminal portions of all Arabidopsis formins are divergent, although there is 63% identity between AtFORMINs 1 and 4 in the overlaping parts of their sequences. Analysis of AtFORMIN sequences with SMART [26,27] revealed no previously characterized domains outside the FH2 region. However, putative amino-terminal membrane insertion signals (signal peptides) followed by a segment highly likely to be membrane-spanning and a variable number of possible transmembrane domains were found in AtFORMINs 1, 2, 4, 6 and 8. A possible membrane insertion signal was also identified in AtFORMIN5 by one of the two methods used (see Materials and methods, and Figures 2,4). The length of predicted signal peptides suggests that they may represent membrane anchors rather than secretion signals [28]. A putative transmembrane segment was also found in the apparently amino-terminally truncated sequence of AtFORMIN3. In contrast, no signal peptides were found in 12 fungal and animal formins listed in Materials and methods, although transmembrane-like segments were observed in some. Surprisingly, the putative transmembrane segment lies between the two Pro-rich regions in AtFORMINs 2, 6 and 8. Obviously, only the cytoplasmic one of these two motifs can act as a conventional FH1 domain. Its size ranges from 106 to 423 amino acids, with proline content of 13 to 41% and multiple stretches to five to nine consecutive proline residues. This structure roughly corresponds to that of previously characterized FH1 domains [10]. Interestingly, the FH1 domains of AtFORMINs 2, 7 and 8 are extremely rich in serine (up to 20%) and contain stretches of up to seven consecutive serine residues.

Figure 4
figure 4

Putative membrane anchors and transmembrane domains of Arabidopsis formins. Aliphatic (I, L, V), aromatic (F, H, W, Y) and other potentially hydrophobic (A, C, G, K, M, R, T) amino acids are highlighted

The other proline-rich domain of AtFORMINs 2, 6 and 8 is predicted to be exposed to a non-cytoplasmic compartment. Given that polyproline stretches are characteristic for a class of structural cell-wall proteins known as extensins [29], it is tempting to speculate about a possible role for this domain in communication between formins and structures within the cell wall. Apart from this, few predictions of function can be made on the basis of the sequence data. Although formins are well conserved with respect to their molecular structure, we do not know the extent of their conservation within signaling or structural modules [21]. As the relationships between protein structure, module structure and biological function are far from straightforward [30], we can at present neither prove nor exclude the possibility that plant formins contribute to similar functional modules to their animal and fungal counterparts. The question of whether these proteins have a direct role in cytokinesis, in mitotic spindle localization, or in some other cellular process, possibly involving cytoskeleton rearrangement or cell-surface growth, will have to be answered experimentally.

Conclusions

A systematic search of the available Arabidopsis genomic and cDNA sequences revealed the presence of eight genes encoding proteins that define a novel subfamily of the formin family. At least six out of eight Arabidopsis formins appear to be integral membrane proteins. This indicates a mechanism of membrane localization that may be specific to plants and functionally related to a possible role for formins in the communication between the plant cell and extracellular structures.

Materials and methods

Identification of Arabidopsisformin homologs and protein sequence prediction

The initial search for formin homologues in the non-redundant Arabidopsis thaliana protein (NRAT) database, performed using the PatMatch program [31,32] with the query pattern L-x-x-G-N-x-M-N, yielded three potential formin homologs - AtFORMIN1 to AtFORMIN3. AtFORMINs 2 to 8 were found by a TBLASTN 2.0 search [33,34] in GenBank, using the predicted protein sequence of AtFORMIN 1 as query (P(N) values in the range of 5.8×10-227 to 1.3×10-11). Known members of the formin family (a human diaphanous homolog and Drosophila melanogaster cappucino) were found in the same search (P(N) values 1×10-21 and 1.3×10-13, respectively), verifying the statistical significance of the initial PatMatch results.

Intron positions in the genomic sequences were determined (or confirmed) using the NetGene2 server [25]. Translation of the DNA sequences was performed on the SIB ExPASy WWW server [35,36]. Only the longest predicted ORFs were subjected to further analysis.

Sequence alignment and domain structure analysis

All sequence comparisons were done on a set of 20 metazoan, yeast and plant formin sequences. These were FUGU, Fugu rubripes formin homolog gb|AAC34395.1; LFORMIN, mouse lymphocyte-specific formin gb|AADo1273; BNR1, yeast Bnr1 protein sp|P40450; BNI1, yeast Bni1 protein sp|P4183; FHOS, human formin-like protein gb|AAD39906.1; CAENO, Caenorhabditis elegans formin homolog gb|AAB42354.1; CAPPU, D. melanogaster Cappuccino gb|AAC46925.1; P14oMDIA and P134MDIA2, mouse Diaphanous homologs gb|AAC53280 and gb|AAC71771.1; DIA-DROME, D. melanogaster Diaphanous sp|P48608; CYK1, C. elegans Cyk1 assembled from gb|AAA81161.1 and gb|AAC17501.1; MFORMIN, mouse formin sp|Qo5860; and AtFORMIN 1 to 8. Protein sequences were aligned with the aid of MACAW [37], using the Gibbs sampler and segment pair algorithms, BLOSUM45 matrix. Only blocks with P<10-7 were considered. No homology to FH3 as defined by Petersen et al. [23] or to the amino-terminal conserved region [10] was revealed by this tool, whereas the FH2 domain was readily identified. Non-aligned parts of the sequence within the FH2 domain were adjusted manually. Consensus of the resulting alignment of FH2 (deposited in the EMBL alignment database, accession number DS39866) has been calculated for each subdomain separately (see Figure 1) by the method of Brown and Lai [38,39].

The SMART program [26,27] was used to examine predicted protein sequences for the presence and location of known sequence domains, putative secretion signals, transmembrane segments, coiled-coil motifs and low sequence complexity regions (usually representing proline-rich FH1 domains whose location was confirmed by visual inspection). Prediction of signal peptides by the neural network (NN) method [28]) was independently verified by a hidden Markov model-based (HMM) method on the SignalP 2.0 server [40,41]). Results of both methods were in agreement, with the exception of AtFORMIN5, which was predicted to be membrane-anchored by NN but cytoplasmic by HMM.

Construction of the evolutionary tree

The tree (Figure 3) was calculated from the three FH2 subdomains present in all formins studied, using programs from the PHYLIP package [42,43] version 3.573. An input file was prepared by joining subdomains a, c and h and was used to produce a bootstrapped data set by SEQBOOT with 500 sampling cycles. Distances were calculated using PROTDIST with the PAM distance matrix, and the results were used for tree construction using the neighbor-joining method [44] by NEIGHBOR. The consensus tree was determined by CONSENSE and plotted using DRAWTREE.