Background

Proteoglycans are important and ancient mediators of cell interactions in metazoan organisms. In the simplest multicellular animals, sponges, extracellular proteoglycans contribute to adhesion between cells and processes of self-recognition and host defence [1, 2]. The syndecans are the only major family of transmembrane proteoglycans and are conserved in metazoa from nematode worms to man. The genomes of invertebrates such as C. elegans and D. melanogaster contain a single syndecan gene, whereas mammalian genomes contain four syndecan genes [3, 4]. In all the organisms in which they have been studied, syndecans have important roles in cell interactions, adhesion, migration and receptor signaling. The single syndecans of C. elegans and D. melanogaster are required for proper axon guidance during development of the nervous system [58]. C. elegans syndecan also functions in vulval development [9]. The vertebrate syndecans each have distinct tissue expression patterns and distinct functional attributes. For example, syndecan-1 is expressed in epithelia and certain mesenchymal cells. Syndecan-1 null mice show deficits in cell migration behaviors associated with inflammatory response and wound-healing [10, 11]. In single cells, syndecan-1 activates formation of filopodia and lamellipodia that mediate cell motility [12, 13]. Syndecan-2 is expressed by mesenchymal, liver and neuronal cells. It participates in the development of the left/right body axis during Xenopus development [14], has been identified to function in the assembly of neuronal dendritic spines and in ECM assembly in mammalian systems, and is essential for angiogenesis in zebrafish [1517]. Syndecan-3 is mostly expressed in the nervous system and is important in hippocampal function, particularly in controlling feeding behavior and hippocampus-dependent memory [18, 19]. Syndecan-4 is broadly expressed by many tissues and cell types and has specific signaling roles as an integrin co-receptor in the assembly of focal adhesions. It also contributes to angiogenesis and wound-healing in mice [4, 20, 21].

In mammals, syndecans have significant clinical relevance in the pathologies of infection, cancer and wound repair. Syndecan-1 is a receptor for human immunodeficiency virus in cell culture [22] and contributes to the pathology of infection by Pseudomonas aeruginosa and Staphylococcus aureus [2325]. Syndecan-1 deficient mice have increased resistance to wingless-1-mediated tumorigenesis [26]. Altered expression of syndecan-1 has been documented in many human carcinomas, lymphomas and in multiple myeloma; in multiple myeloma syndecan-1 has been proposed for clinical use as a prognostic marker [2734]. Other studies have implicated syndecans in tissue repair. Thus, the lack of syndecan-1 or -4 in mice results in impaired inflammatory responses, wound-healing, and angiogenesis [10, 21, 35]. Studies in animal models have demonstrated increased expression of syndecan-1, -3 and -4 after cardiac injury, indicating possible roles in cardiac remodeling [3638].

In structure, all syndecans consist of an extracellular domain with sites for attachment of glycosaminoglycan (GAG) sidechains, a single transmembrane domain and a short cytoplasmic domain. Through their GAG chains, syndecans act as co-receptors for the cell-surface binding of heparin-binding growth factors such as bFGFs (basis fibroblast growth factor) and Wnt (wingless) [3, 4, 26, 39]. The activity of invertebrate syndecans in axon guidance depends on co-receptor activity in the slit/Robo pathway [5, 8]. Direct protein-protein interactions of the extracellular domain have also been implied for syndecan-1 and -4 [40]. Based on comparisons of mammalian, C. elegans and D. melanogaster syndecans, the cytoplasmic domains are recognized to contain two conserved regions, designated C1 and C2, that are present in all these syndecans, and a central variable, (V), region that is unique to each form of syndecan. The V regions themselves are very highly conserved between species orthologues of syndecans -1 to -4, suggestive of important specific functions [3, 4]. In general, the cytoplasmic domains participate in the assembly of juxtamembrane complexes that regulate cell signaling and the organizational state of the actin cytoskeleton. The C2 region binds PDZ-containing proteins that form multiprotein scaffolds by oligomerization and which also mediate syndecan recycling [41]. Cask, src tyrosine kinase and synectin are known binding partners of the C1 region. The V regions have been most intensively studied in syndecan-2 and syndecan-4, and have specific binding partners that contribute to cell signaling that regulates cytoskeletal structures [3, 4, 15]. On the basis of their protein sequences, the four mammalian syndecans group into two pairs: syndecan-1 and -3 have higher sequence identity with each other, as do syndecan-2 and syndecan-4 [3, 4].

To date, the relationships of invertebrate and vertebrate syndecans have been considered only at the level of their protein sequences. With regard to these analyzes, there are gaps in our knowledge of chordate syndecans, as the most intensive studies have focused on mammalian and amphibian syndecans [e.g., [3, 4, 10, 11, 14, 18, 20, 21]]. Invertebrate syndecans have been experimentally studied in D. melanogaster and C. elegans, but little is known about other invertebrate syndecans. It is now recognized that the genomes of D. melanogaster and C. elegans have evolved rapidly with extensive gene loss [42, 43]. Thus, one purpose of our study was to capitalize on the recent sequencing of the genomes of the basal chordate, Ciona intestinalis, the chicken and three species of fish, along with the expanded knowledge from the genomes of the sea urchin Stronglyocentrotus purpuratus, the amphioxus Branchiostoma floridae, multiple representatives of the Cnidaria, and the many expressed sequence tag projects directed to invertebrate species, to obtain a comprehensive view of this important proteoglycan family at protein sequence level. A second purpose has been to develop a new level of knowledge for the whole syndecan family, by analyzing the genomic contexts of syndecan genes in multiple organisms. The comparative genomics of gene families is a powerful approach to understanding the relationships between the members of multigene families, either within a single genome or through comparison of the conservation of genomic context across the genomes of multiple organisms. In addition, ideas about the origins of gene families can be clarified through the study of the genomes of modern animals from different phyla. The insights obtained from comparative genomics are relevant to making choices of model organisms for experimental purposes and may advance understanding of the roles of gene families at systems level [44].

Results

Molecular phylogeny of syndecans across the animal kingdom

Our searches of recently completed genome sequences and the database of expressed sequence tags produced an expanded dataset of syndecans that demonstrates remarkable conservation of this proteoglycan family across the animal kingdom. Syndecans were identified in multiple species of Cnidaria as well as throughout the Bilateria. The genomes of Ciona intestinalis, Stronglyocentrotus purpuratus and amphioxus (Branchiostoma floridae) encode single syndecans and multiple syndecans are encoded in fish and other vertebrate genomes (Table 1). However, no syndecan-1 sequence was identified in fish, nonwithstanding extensive searches of three fish genomes and the large database of expressed sequence tags from many teleost fish species.

Table 1 Accession numbers of Syndecans.

The sequence relationships of selected invertebrate and fish syndecans to human syndecans were examined by calculating WU-BLASTP conservation scores [45]. X. laevis syndecans were included for comparison. Whereas each of the X. laevis and T. nigroviridis syndecans was clearly most closely related to one of the human syndecans (e.g., T. nigroviridis syndecan-3 has the highest score with human syndecan-3, etc), each of the invertebrate syndecans had similar low conservation scores with all the human syndecans and in some cases no hit was obtained (Table 2). This suggests that the sequences of invertebrate syndecans are similarly related to all four of the vertebrate syndecans.

Table 2 Sequence conservation of representative invertebrate, fish and amphibian syndecans.

The cytoplasmic domains have been recognized as the most conserved region of syndecan protein sequences [3, 4]. To examine the conservation of cytoplasmic domains across the new dataset, we aligned the sequences by the CLUSTALW progressive local alignment method [46]. The alignment demonstrated near complete conservation of the C1 region in all syndecans, with some variation at the second and third positions in the vertebrate syndecan-2 group, D. rerio syndecan-4, and many of the invertebrate syndecans (Figure 1). The C2 region is almost universally conserved in Bilateria as EFYA, with the exception of Ciona syndecans that contain EYYA and fish syndecan-4s that contain EIYA. Greater variability of the C2 motif is evident in the Cnidaria (Figure 1). All of these variants are predicted to be functional PDZ-binding motifs [47].

Figure 1
figure 1

Conservation of syndecan cytoplasmic domains. Amino acid sequences of syndecan cytoplasmic domains from different species were aligned in ClustalW. The alignment is presented in Boxshade 3.2. Black shading shows identical amino acids; gray indicates conservative substitutions and no shading indicates unrelated amino acids. V = variable region and the very highly-conserved leucine and tyrosine residues are indicated by asterisks. The central hypervariable region (13) is also indicated. Key: Ac, Anthocidaris crassispina; Ap, Acropora palmata; At, Acropora tenuis; Bf, Branchiostoma floridae; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Cs, Ciona savignyi; Dm, Drosophila melanogaster; Dr, Danio rerio; Es, Euprymna scolopes; Gg, Gallus gallus; Hm, Hydra magnipapillata; Hs, Homo sapiens; Mj, Marsupenaeus japonicus; Mm, Mus musculus; Nv, Nematostella vectensis; Rp, Rhipicephalus appendiculatus; Sm, Schmidtea mediterranea; Sp, Strongylocentrotus purpuratus; Tn, Tetraodon nigroviridis; Xl, Xenopus laevis; Xt,Xenopus tropicalis. All syndecans are abbreviated as S.

The alignment also demonstrated that the invertebrate syndecans from different phyla contain distinctive V regions, all of which are different from those of syndecan-1, -2, -3 and -4. In vertebrate syndecans, the syndecan-4 V region is well-conserved between fish and tetrapods. In the syndecan-2 group, the V region of D. rerio syndecan-2 is very similar to that of tetrapod syndecan-2s, whereas in T. nigroviridis the central hypervariable region is distinct at 3 residues. The V regions of syndecan-3s are well-conserved between fish, chicken, mouse and human, yet the V region of the Xenopus syndecan-3s have unique features that result in a separate clustering in the sequence alignment (Figure 1). Most strikingly, the alignment highlighted the universal conservation of the first leucine residue in the V regions of all animals. A single tyrosine residue is also very well-conserved across the Bilateria but is not present in any of the V regions from the Cnidaria (asterisks in Figure 1).

To compare all the full-length sequences in the dataset, we used the TCOFFEE alignment algorithm that combines pairwise/global and local alignment methods into a single model and is more accurate than CLUSTALW for sequences of lower identity [48]. The Phylip distance matrix output was used to prepare an unrooted tree. This analysis confirmed the absence of syndecan-1 from the fish species. Fish and amphibian syndecan-2, -3 and -4 all grouped clearly with their tetrapod counterparts. Each set of syndecan-1, -2, 3- and -4 sequences formed a distinct group within the phylogenetic diagram, and the syndecan-2 and -4 groups appeared more closely related to each other than to other syndecans. X. laevis syndecan-1 has unusual histidine-rich sequence features [49] and therefore forms an deep-branching node with tetrapod syndecan-1s; nevertheless the syndecan-1 grouping was clear (Figure 2; the sequence of X. tropicalis syndecan-1 is similar to that of X. laevis; X. tropicalis syndecan-1 is present on scaffold 100 in genome assembly 4.1) [50].

Figure 2
figure 2

Sequence relationships of invertebrate and vertebrate syndecans. The amino acid sequences of all the full-length syndecans in the dataset were aligned in TCOFFEE. The Phylip distance matrix output was used to prepared an unrooted phylogenetic diagram in DRAWTREE. Key is the same as in Figure 1. Scale bar = 0.1 substitution/site.

To evaluate the robustness of these groupings, further analyses of the full-length sequences and the cytoplasmic domains were conducted using the program PHYML, that estimates phylogenies based on the maximum-likelihood principle [51], with inclusion of bootstrap analysis [52]. The unrooted tree derived from the full-length sequences confirmed the distinct groupings of the sets of syndecan-1, -2, -3 and -4 and also demonstrated the syndecan-1/syndecan-3 and syndecan-2/syndecan-4 pairings. These topologies were supported by robust bootstrap values (at least 67%, with the exception of the G. gallus /X. laevis syndecan-1 node) (Figure 3A). However, the positioning of invertebrate syndecans relative to vertebrate syndecans was not well supported, likely due to the extent of divergence of their extracellular domain sequences (Figure 3A). Most of the subgroupings within the invertebrates appeared biologically meaningful and were well-supported by the bootstrap analysis, with the exception of the placements of the amphioxus sequence at a node with Cnidarian syndecans and C. elegans at a node with echinoderm syndecans (Figure 3A). It is likely that these anomalies are an artifact of long branch attraction. At present, there are no additional cephalochordate or nematode syndecan sequences to include in the dataset. The same general topologies for the vertebrate syndecans were supported in the tree prepared from the cytoplasmic domains. Again, not all nodes within the invertebrate syndecan group were biologically meaningful -M. japonicus syndecan was placed as a outgroup of the Cnidaria – and the central nodes between vertebrate and invertebrate syndecans were not robust (Figure 3B). In summary, the multiple molecular phylogenetic methods of analysis of syndecan protein sequences support the pairings of syndecan-1 with syndecan-3, and syndecan-2 with syndecan-4, and indicate that the invertebrate syndecans are distantly and equivalently related to the four vertebrate syndecans.

Figure 3
figure 3

Phylogenetic trees of full-length and cytoplasmic domain of syndecans. The CLUSTALW outputs of TCOFFEE alignments of the amino acid sequences of (A), full-length, and (B), cytoplasmic domains, of syndecans in our dataset were used to generate unrooted phylogenetic trees by the maximum likelihood reconstruction method, PHYML, using the WAG substitution model. Bootstrap analysis was run for 100, (A), or 500, (B), cycles and the bootstrap replication values are shown at each node. Scale bars = 0.1 substitution/site.

Vertebrate syndecan genes show extensive conservation of synteny

To obtain a clearer perspective on the relationships of the four vertebrate syndecans, we evaluated the vertebrate syndecan gene family at genomic level by comparing the conservation of the gene neighbors of each syndecan gene between the mapped genomes of human, mouse, chicken and, where available, fish. In each genome, each syndecan gene is located on a different chromosome. For each family member, a set of conserved gene neighbors could be identified. For the syndecan-1 gene, six local neighboring genes (HS1BP, RHOB, PUM2, LAPTM4A, MATN3 and WDR35) and, more remotely, KCNS3, were conserved in all three species (Figure 4A).

Figure 4
figure 4

Conservation of synteny between vertebrate syndecan genes. The genomic contexts of: (A), syndecan-1 genes; (B), syndecan-2 genes; (C), syndecan-3 genes and (D), syndecan-4 genes, were analysed for conservation of neighboring genes in the human, mouse, chicken, T. nigroviridis and D. rerio genomes. Each diagram represents the order of genes along a portion of the indicated chromosomes. Each horizontal line represents a gene: red lines represent genes that are syntenic and black lines represent non-conserved intervening genes. The numbers indicate the exact location in nucleotides of each region on its respective chromosome in each species. HUGO gene names are given, and the syndecan genes are indicated in red.

For the syndecan-2 gene, STK3 and PGCP were conserved as neighbouring genes in human, chicken and mouse. TSPYL5, LAPTM4B, MATN2 and KCNS2 genes were also conserved between human and mouse and PTDSS1, GDF16 and TP53INP were also conserved between human and chicken (Figure 4B). In comparing the locus of the syndecan-2 gene in T. nigroviridis, PGCP was also conserved as a neighboring gene (Figure 4B). MATN2 was not present in this region, but additional searches identified that it is indeed encoded on chromosome 8 (discussed further below). In D. rerio, the syndecan-2 gene on chromosome 5 has no gene neighbors in common with those of tetrapods or T. nigroviridis, and we infer that this gene underwent a lineage-specific transposition (data not shown).

PUM1 and MATN1 were conserved gene neighbors of the syndecan-3 gene in human, mouse, chicken and T. nigroviridis, and MATN1 also mapped adjacent to D. rerio SDC3 (Figure 4C). The most extensive local synteny was between human and mouse: TDE2l, FABP3, LAPTM5 and PTPRU were all conserved as local gene neighbors (Figure 4C). Of these genes, FABP3 was also identified to be syntenic with SDC3 in T. nigroviridis. The gene encoding the tyrosine kinase Lck was also local to SDC3 in human and mouse. In the two fish genomes, genes encoding tyrosine kinases were also adjacent to SDC3, however the gene products were most similar to other members of the tyrosine kinase family (Figure 4C).

For the syndecan-4 gene, strong conservation of the genomic regions was apparent between human, mouse and chicken: KCNS1, MATN4 and RBPSUHL were conserved between all three species and additional local genes were conserved between human and mouse (Figure 4D). SDC4 is not yet mapped to a chromosome in either the T. nigroviridis or D. rerio genomes.

The high conservation of the genomic context of vertebrate syndecan genes prompted us to examine whether this synteny extends to the urochordate, Ciona intestinalis. The urochordate group are basal in the chordate lineage and have not undergone whole genome duplications [53]. The single syndecan gene of C. intestinalis is encoded on chromosome 2, scaffold 125 [54]. The C. intestinalis genome encodes a single matrilin (Gene Cluster 14563), however, this is located on scaffold 259 on chromosome 1. (This gene product has highest identity to matrilin-3. The Gene Cluster 00035/06628 that is assigned as "matrilin-2" at the Ghost website has highest identity to fibrillin). Of the other gene families consistently represented on the same chromosomes as tetrapod syndecan genes, only Gdf16 was identified on C. intestinalis chromosome 2 (scaffolds 213 and 58). Ten local gene neighbours of the syndecan gene of C. intestinalis are not syntenic with any human or chicken syndecan gene (data not shown). Thus, there are many differences in the genomic context of C. intestinalis syndecan.

Paralogous locations of syndecan and matrilin genes in the human genome

Overall, the data on the genomic contexts of vertebrate syndecan genes demonstrated two striking points. First, in considering the individual family members, there is clear conservation of synteny between fish and tetrapods for each of the syndecan-2, -3 and -4 genes. This indicates that these loci must each have been present in the last common ancestor of fish and tetrapods. Secondly, a deeper level of homology is apparent, in that the syndecan-1, -2, -3 and -4 genes in the organisms we examined all have conserved gene neighbours that are paralogous members of the same gene families. The clearest example of this are the matrilins, that are components of extracellular matrix and which in tetrapods also comprise a gene family of four members [55, 56]. With the exception of chicken syndecan-2, a matrilin gene is located near to every examined tetrapod syndecan gene. The same trend was apparent for the fish syndecan-3 genes (Figure 4). Members of the LAPTM4 and KCNS gene families were also present in the local genomic region of most of the tetrapod syndecan genes. The two members of the pumilo gene family were conserved adjacent to the syndecan-1 and -3 genes (Figure 4).

These findings strongly suggest that, in each genome, the four syndecan genes are located in paralogous genomic regions that have been conserved throughout the evolution of vertebrates. The existence of such regions in vertebrate genomes provides evidence of whole genome duplication events that took place early in chordate evolution [5760]. In the lineage of each of the organisms examined, subsequent gene loss or localized rearrangements have blurred the initial four-fold replication of the region in distinct ways over time [61]. In the human genome, the rate of DNA rearrangement is slower than in rodents [61] and paralogous regions have been identified globally by BLASTP-based searches of the genome against itself [57, 58]. To substantiate the view obtained from our analysis of local genes on an individual basis, we examined whether the human genome contains evidence of chromosomal paralogies in the regions of the four syndecan genes according to this unbiased independent method. Searches were made through the database of paralogons in the human genome, v5.28 [57]. Strikingly, the genomic region of each human syndecan gene was found to be related to the genomic region of all the other family members. Members of the matrilin family were included in all blocks, and other gene families identified in our local searches were represented in individual paired blocks along with genes that are more distant on each chromosome (Figure 5). Together, these findings provide strong evidence that the four vertebrate syndecan genes have evolved as a consequence of rounds of duplication of a single ancestral chromosomal region.

Figure 5
figure 5

Paralogous locations of syndecan genes within the human genome. Paralogous regions covering all four members of the syndecan gene family were identified in the human genome, from the "dataset of paralogons in the human genome v5.28" 57. Each block number and the exact location of the paralogous regions on the respective chromosomes are also shown. Shaded lines represent how the original blocks were modified to include missing syndecan genes (see methods).

The absence of syndecan-1 in fish: analysis of the genomic contexts of matrilin genes in T. nigroviridis and D. rerio

Because synteny of syndecan-2 and -3 genes is conserved between tetrapods and pufferfish, and because all four syndecan gene loci show evidence of paralogy in tetrapods, we inferred that the absence of syndecan-1 from fish is a derived characteristic. This raised the question of whether other parts of the conserved chromosomal region are present in fish, even though the syndecan-1 gene itself is missing. We addressed this question through the proximity of syndecan and matrilin genes. In all tetrapods examined, MATN3 is close to SDC1 (Figure 4). All four forms of matrilin are present in T. nigroviridis (Table 3) and both T. nigroviridis and D. rerio matrilins include two matrilin-3 paralogues (Table 3) [62]. The existence of such paralogous pairs is consistent with the strong evidence for an additional whole genome duplication early in the evolution of teleost fish [63]. We examined the genomic contexts of matrilin genes in T. nigroviridis and D. rerio. In T. nigroviridis, all four matrilins, including MATN3, are encoded within genomic regions syntenic to those of tetrapod matrilins (Figure 6A). MATN2 is encoded at a distinct point on chromosome 8 from SDC2, yet the adjacent genes include LAPTM4B and KCNS2, that are conserved neighbours in tetrapods. Similarly, KCNS2 is adjacent to MATN4 (Figure 6A). We hypothesized that fish matrilin-3s should be encoded within a genomic region similar to that of tetrapod matrilin-3 and syndecan-1. Importantly, the order of genes adjacent to T. nigroviridis MATN3 matched closely with those adjacent to tetrapod MATN3: however, in T. nigroviridis, PUM2 and LAPTM4A are adjacent to each other, suggesting a specific loss of SDC1. Although WDR35 is not present on chromosome 14, we found that this gene is located on chromosome 10 adjacent to the T. nigroviridis paralogue of MATN3a (Figure 6A).

Figure 6
figure 6

Genomic contexts of matrilin genes in T. nigroviridis and D. rerio. The genomic contexts of matrilin gene family members from (A), T. nigroviridis and (B), D. rerio, were analysed for conservation of neighboring genes with human, mouse and chicken, as shown in Fig. 3, and with each other. Partial synteny was identified. Each diagram represents the order of genes along a portion of the indicated chromosomes. Each horizontal line represents a gene: red lines represent genes that are syntenic and black lines represent non-conserved intervening genes. The numbers indicate the exact location in nucleotides of each region on its respective chromosome in each species. HUGO gene names are given, and the matrilin genes are indicated in red.

Table 3 Accession numbers of Matrilins.

The genomic contexts of D. rerio matrilin genes, (some of which were previously mapped by the radiation hybrid method [62] and all of which are now physically mapped in Zebrafish genome assembly Zv5), demonstrated the same points, although the conservation of local neighboring genes was not as extensive for MATN1 as in T. nigroviridis (Figure 6B). In the case of MATN4, the local neighboring genes did not match those of tetrapod MATN4, but RBPSUHL is conserved as a neighboring gene between D. rerio and T. nigroviridis (Figure 6B). As in T. nigroviridis, MATN3a and MATN3b both have gene neighbors that match those of tetrapod MATN3. However, the neighbors of each MATN3 paralogue represent the opposite sides of the conserved genomic region of tetrapods, indicating a break of the chromosome after the additional genome duplication [62]. We did not identify PUM2 in the D. rerio genome (assembly Zv5). We examined the genomic regions of fish matrilin-3s, as shown in Figure 6, more closely for evidence of syndecan-1 coding sequence. BLAT search of the Tetraodon genome with the mRNA nucleotide sequences of human, chicken, or X. laevis syndecan-1 did not yield significant findings: in each case, only short, 10–20 nucleotide regions on many different chromosomes were identified as homologous. Syndecan-1 protein sequence searches identified only the cytoplasmic domain of syndecan-2 on chromosome 21. Pairwise BLAST searches of syndecan-1 coding sequences against the Tetraodon or Danio genomic sequence regions did not identify significant matches. Thus, the syndecan-1 coding sequence is not represented in this genome. In conclusion, our findings provide further evidence for partial synteny of the genomic contexts of syndecan and matrilin genes between fish and tetrapods. Most importantly, the fish genomes provide evidence of complex genomic rearrangements in the region of the matrilin-3 gene that have involved the loss of the syndecan-1 encoding sequence.

Discussion

We report here the results of molecular phylogenetic and comparative genomic analyses of members of the syndecan family of proteoglycans. The presence of syndecans in organisms from Cnidaria to mammals establishes firmly that the syndecan family is ancient in the animal lineage. The Cnidaria, comprising corals, jellyfish, sea anemones and hydroids, are typified by diploblasty, ability for sexual and asexual reproduction and relatively simple body plans. The divergence of the Cnidaria and Bilateria is thought to have taken place between 650–1000 million years ago (MYA) [43]. Nevertheless, genome and EST project have revealed a high level of conservation of Cnidarian genes and their coding sequences with those of vertebrates, including integrins and components of the extracellular matrix [42, 64]. The identification of syndecan-encoding sequences in multiple species of Cnidaria provides a strong indication that syndecans, like integrins, have participated as mediators of extracellular interactions throughout animal evolution. The data also establish firmly that the encoding of mutiple syndecans in a single genome is a vertebrate-specific attribute: the genomes of two basal chordates, Ciona intestinalis and Ciona savignyi, were each found to encode a single syndecan, and single syndecans were identified from the draft genome of the sea urchin S. purpuratus, the Cnidarian species, and the amphioxus B. floridae EST dataset (Table 1).

The molecular phylogenetic studies demonstrated that syndecan extracellular domains have diverged rapidly in sequence – e.g., each of the invertebrate syndecans has an extracellular domain that is distinct from other species and from all the vertebrate syndecans – whereas the cytoplasmic domains contain highly conserved elements. Identification of syndecans across phyla thus relies heavily on recognition of the cytoplasmic domains in database searches. Within all cytoplasmic domains, the C1 and C2 regions are extremely well-conserved. With regard to the distinctive V regions, our study identified two residues that are extremely highly conserved: the first leucine residue, (corresponding to L289 of mouse syndecan-1) that is universally conserved and the tyrosine residue (corresponding to Y300 of mouse syndecan-1) that is conserved throughout the Bilateria but not in Cnidaria (Figure 1). Current knowledge of vertebrate syndecans suggests that these residues could serve either structural or functional roles. The syndecan-4 cytoplasmic domain forms a homodimer in which L186 is important for the stability of the structure because it makes three contacts with the C1 and V regions of the partner cytoplasmic domain [65]. Although other syndecan cytoplasmic domains are not known to form dimers, it is conceivable that the corresponding leucines could make similar contacts with heterologous binding proteins. Y192 in syndecan-4 also makes three contacts within the homodimer [65]. Thus, in syndecan-4 the most critical roles of L186 and Y192 are structural. In contrast, Y300 in syndecan-1 has been implicated in several signaling roles: it appears necessary for the alignment of syndecan-1 with actin stress fibres [66] and a phosphomimetic Y300E mutant inhibits the assembly of lamellipodia and actin-and-fascin bundles by activated syndecan-1 [13]. It is possible that the presence of this additional tyrosine conferred additional signaling properties on syndecans in the Bilateria. The perspective provided by the new syndecan protein sequence dataset will help guide further experimental analysis of syndecan cytoplasmic domains.

To understand the relationships between the four tetrapod syndecans and gain more insight into the origins of fish syndecans, we turned to a comparative genomic analysis of the vertebrate syndecans. The conservation of synteny is a powerful way to resolve the relationships between paralogous and orthologous members of multigene families in different species [67]. Previous phylogenetic coding sequence analyses have provided indications of paralogy within the family, but the data were not conclusive [68, 69]. The loci of the syndecan-1, -2, -3 and -4 genes each showed striking synteny in human, mouse and chicken. With reference to the syndecan genes alone, only partial synteny was detected in fish: whereas in T. nigroviridis and D. rerio the context of SDC3 was clearly syntenic, synteny of SDC2 was partial in T. nigroviridis andabsent in D. rerio. Synteny in fish could not be addressed for the absent syndecan-1 and the unmapped syndecan-4. As discussed below, we achieved resolution of this question by the combined analysis of syndecan and matrilin genes in fish.

The identification of conserved neighboring genes in multiple genomes also revealed a deeper level of homology between all four loci within a single genome, in that most of the conserved neighboring genes also corresponded to paralogous members of gene families. This indication that the four syndecan genes are located in paralogous chromosomal regions was confirmed on the basis of an independent, computationally-based method of analysis of the human genome [57]. When the loci of the four mouse syndecan genes were identified by inter-species backcross mapping, it was noted that the syndecan-1, -2 and -3 genes were syntenic with three members of the myc gene family [70]. There is now strong evidence that paralogous regions exist in vertebrate genomes as a result of whole genome duplications that took place in a vertebrate ancestor after the divergence of the amphioxus lineage [53, 5759]. It appears that the genomic regions of tetrapod syndecan genes have been very well-conserved subsequent to these duplication events. At protein sequence level, there is evidence of pairing within the syndecan family, such that syndecan-1 and -3 are more closely related, as are syndecan-2 and -4 [3, 4]. These pairings are also evident at genomic level: the members of a two gene family, PUM1 and PUM2, are encoded adjacent SDC3 and SDC1, respectively, and STK3 and STK4 are encoded adjacent SDC2 and SDC4 (Figure 4 and Figure 5). Furthermore, the matrilin family also contains pairs: matrilin-1 and -3 are more closely related, as are matrilin-2 and -4 [55, 56]. Collectively, these findings are incorporated into a model for the expansion of the syndecan family from a single syndecan in an ancestral chordate to the current multigene status in modern tetrapods and fish (Figure 7).

Figure 7
figure 7

Model for the evolution of syndecans in fish and tetrapods. The upper panel of the model shows the hypothesized ancestral genomic context of a single syndecan gene, originating in an ancestral chordate (chordate 2) subsequent to the divergence of the Urochordate lineage. The model assumes that four-fold paralogy was then set up in an ancestral vertebrate as a result of two rounds of whole-genome duplication. This process also sets up pairing within each set of paralogues. The lower right hand panel represents how the initial complete paralogy has degenerated through gene rearrangements in modern tetrapods. Fish underwent an additional round of genome duplication that would have generated additional paralogous pairs [58]. The lower left-hand panel represents the situation in two modern fish, in which only the two matrilin-3 paralogues have been retained from FSGD. The syndecan-1 locus appears to have been lost early in the fish lineage. Other gene rearrangements or losses appear specific to the zebrafish or pufferfish lineages.

A surprising and important outcome of the molecular phylogenetic and genomic analyses was the absence of syndecan-1 from the genomes of multiple species of fish (D. rerio, T. nigroviridis and T. rubripes). No syndecan-1-like sequences were identified in the three fish genomes, using either amphibian, chicken, or human syndecan-1 as the query sequences. Searches of dbEST, which includes ESTs from additional bony fish species, also did not provide any evidence for a fish syndecan-1. However, all four matrilins are represented in fish (Fig. 6) [62]. Through analysis of the genomic contexts of fish matrilin genes, we were able to strengthen the evidence for conservation of the respective genomic regions between fish and tetrapods. The fish-specific genome duplication (FSGD) that is estimated to have taken place around 320 MYA [63, 71] is assumed to have given rise to the paralogous MATN3a and MATN3b genes. We examined in detail the genomic region around the MATN3 genes, as the expected locus of the syndecan-1 gene. The comparison of the loci of MATN3 genes in T. nigroviridis and D. rerio revealed that the genomic region appears to have become split after the genome duplication. An alternative possibility that cannot be excluded on current data is that different sets of genes have been retained alongside MATN3a and MATN3b in each chromosome after the FSGD. Other gene losses or rearrangements (e.g. loss of PUM2 from the D. rerio genome, relocation of WDR35 to a different chromosome in T. nigroviridis; Figure 6) appear lineage-specific and therefore, we infer, were of more recent occurence. From the fossil record, the zebrafish and pufferfish lineages are estimated to have diverged around 284–296 MYA [71]. Thus the loss of SDC1 appears, in evolutionary terms, to have occurred relatively soon after FSGD. It is possible that SDC1 was lost before, or at the time of, FSGD, however, the available genomes only sample a portion of the fish lineage. Information on the genomes of more basally diverging species of bony fish and a cartilagenous fish would be required to distinguish between these possibilities. Our working model for the current status of fish syndecan genes incorporates the notion of extensive gene loss after FSGD (Figure 7).

Our findings have several practical implications for future studies of syndecan function. While the zebrafish is an excellent vertebrate model organism for most developmental and disease processes [72], it would not be the model of choice for physiological in vivo studies of syndecan-1. There is evidence from human, flies and C. elegans that adjacent genes can show correlated expression and, in some instances, function in the same pathway [73, 74]. The possibility of co-expression or functional association between syndecans and matrilins has not been considered, yet it is intriguing that both are ECM-associated proteins with roles in cell interactions during development and in disease.

Conclusion

The syndecans are an ancient family of cell adhesion and signaling proteins in animals. The cytoplasmic domains contain very highly conserved sequences and the extracellular domains have undergone rapid change. The four syndecan genes of vertebrates are syntenic across tetrapods, and synteny of the syndecan-2 and -3 genes is also apparent between fish and tetrapods. Each of the four family members are encoded with paralogous genomic regions in which members of the matrilin family are also syntenic between tetrapods and fish. This genomic organization appears to have been set up after the divergence of urochordates (Ciona) and vertebrates. The syndecan-1 gene appears to have been lost relatively early in the fish lineage. These conclusions provide the basis for a new model of syndecan evolution in vertebrates and a new perspective for experimental analysis of the roles of syndecans in cells and whole organisms.

Methods

Syndecan dataset and molecular phylogeny

A. accession numbers

The accession numbers of syndecan and matrilin family members from different organisms, except T. nigroviridis and C. intestinalis, were obtained by BLASTP and TBLASTN searches of GenBank and dbEST at NCBI. The accession numbers of T. nigroviridis syndecans were obtained by BLAST searches of the T. nigroviridis genome assembly [75]. C. intestinalis syndecan was identified by BLAST searches of the genome and the Gene Clusters [54, 76]. The complete datasets are listed in Table 1 and 3. Syndecans from additional species were identified by TBLASTN searches of dbEST: the species included in Table 1 were selected as representatives of additional phyla and sub-phyla for our dataset.

B. BLAST searches

The full-length amino acid sequences of syndecan family members from either mouse or human were used to search for syndecans in other species using BLASTP or TBLASTN alignment algorithms at GenBank, their genome databases, and dbEST. The following genome assemblies were searched: H. sapiens (NCBI build 35.1) [77, 78], M. musculus (NCBI build 34 of May 2005) [79], G. gallus (NCBI build 1.1, March 2004) [80], T. rubripes [81] and D. melanogaster (build 4.1, February 2005) [82] at NCBI Blast 2.2.12. D. rerio (Zv5 assembly of August 2005) was searched at EBI. The C. elegans genome [83] was searched at Sanger worm informatics website (WS149; assembly of Sept, 2005). The T. nigroviridis genome [75] (assembly of April, 2004), X. tropicalis genome assembly v4.1 [50] and C. intestinalis genome and EST database [54, 76] were also accessed. Sequences including the cytoplasmic domains of representatives of planarian (Schmidtea mediterranea) [84], molluscan (Euprymna scolopes), chelicerate (Rhipicephalus appendiculatus) [85], and crustacean (Marsupenaeus japonicus) syndecans were identified in dbEST and the complete sequence of amphioxus (Branchiostoma floridae) syndecan was compiled from over-lapping ESTs in dbEST [[53] and Yu, J., Holland, L.Z., Shin-i, T., Kohara, Y., Satou, Y. and Satoh, N. Expressed genes in Branchiostoma floridae project]. Cnidarian syndecan sequences, from Acropora tenuis [86], Acropora palmata (Schwarz, J.A., Brokstein, P., Manohar, C., Coffroth, M.A., Szmant, A. and Medina, M, Coral-Symbiodinium EST Project), Hydra magnipapillata [Bode et al., WashU Hydra EST project], and Nematostella vectensis [43], were identified first by TBLASTN search of Stellabase [87] and Cnidbase [88] with known invertebrate syndecans and then confirmed and extended by BLASTN and TBLASTN searches of dbEST.

C. multiple alignment

Predicted amino acid sequences of the cytoplasmic domains of syndecans from organisms representing different phyla were aligned in CLUSTALW [46] and are presented in Boxshade 3.2. Predicted amino acid sequences of full-length syndecans were aligned in TCOFFEE (version_2.11) with default parameters using pairwise methods [48].

D. tree building

Several parallel methods were used to assess molecular phylogenetic relationship of syndecans, using either the full-length sequences or the cytoplasmic domains. An unrooted phylogenetic tree was constructed from the Phylip distance matrix output of a TCOFFEE alignment [48] in DRAWTREE, and is presented in Phylodendron (D.G. Gilbert, version 0.8d) with the choice of intermediate node positions and node lengths for tree growth [89]. Aligned amino acid sequences of the full-length and cytoplasmic domains of syndecans were analyzed using PHYML, a maximum likelihood reconstruction method, with the WAG substitution model [51]. Bootstrap proportion was used to assess the strength of the topologies for this method [52]. The output data was used to obtain a consensus tree based upon majority extended rule. The consensus outree file (in newick format) was used to prepare an unrooted tree in Phylodendron.

E. WU-BLAST

The newly-identified syndecans from representative invertebrates, urochordate, amphibian and fish were compared with the well-characterized human syndecans using WU-Blast 2.0 against the Uniprot dataset [90, 91]. The conservation scores were calculated by dividing the bit score of each hit by the maximum bit score value obtained from the sequence used to conduct the WU-BLAST search [92].

Identification of synteny between syndecan genes

The chromosomal locations of syndecan genes were identified in the physically-mapped genomes of human [93], mouse [94], chicken [95], T. nigroviridis [75] and D. rerio (EBI assembly Zv5) by TBLASTN searches of each syndecan against the relevant genome. Genes located on either side of the target syndecan gene, (25 on each side, total 50), on the same contig were identified and compared between organisms. In the case of different gene nomenclatures in different organisms, the encoded protein sequences were used in BLASTP or TBLASTN searches against the human genome assembly, to identify the corresponding human gene and its locus. The HUGO gene names are given in the figures.

Assessment of paralogy

To identify whether the syndecan and matrilin genes are present in paralogous regions of the human genome, the "dataset of paralogons in the human genome, v5.28" was searched [57]. These blocks were defined by McLysaught et al., using a coding sequence cutoff value of e = 10-7. Because of the low sequence identity of syndecan extracellular domains, syndecan-1 and -3 were not reproducibly recognized by these criteria. All syndecan genes are included in all blocks in Figure 5.