The chemokine superfamily includes a large number of ligands that bind to a smaller number of receptors [1, 2]. The best known function of the chemokines is the regulation of migration of various cells in the body, hence their name (from 'chemotactic cytokines'). The importance of the chemokines has grown in recent years, as it has become recognized that they are key players in many disease processes, including inflammation, autoimmune disease, infectious diseases (such as HIV/AIDS), and more recently, cancer (in particular in regulating metastasis) [3]. Multiple chemokine ligands can bind to the same receptor; the perceived complexity and promiscuity of receptor binding has often made this field a challenge to understand and given the impression that chemokines lack specific effects. We have now, however, probably identified most human chemokine ligands. The chemokines are small peptides, whereas their receptors are class A G-protein-coupled receptors. They are best known from mammals, but chemokine genes have also been found in chicken, zebrafish, shark and jawless fish genomes, and possible homologs of chemokine receptors have been reported in nematodes. Careful analysis of the members of the superfamily and their receptors shows a logical order to its genomic organization and function, which in turn is the result of evolutionary pressures. Here, we provide a global view of the chemokine and chemokine receptor superfamilies, focusing particularly on the relationship between their evolution and their functions.

The chemokine ligand and receptor superfamilies

As shown in Table 1, there are at least 46 chemokine ligands in humans. There are also 18 functionally signaling chemokine receptors (plus one, CXCR7, which has been recently reported as a potential chemokine receptor) and two 'decoy' or 'scavenger' receptors, DARC and D6, which are known to bind several chemokines but do not signal; their function may be to modulate inflammatory responses through their ability to remove chemokine ligands from inflammatory sites. In the second half of the 1990s, a large number of new ligands were discovered following the growth of expressed sequence tag (EST) databases. The chemokines were easy to recognize from their characteristic structure, containing several (usually four) cysteines in conserved positions, as well as from their relatively small size (8-14 kDa) and from the fact that they are produced in very large amounts by the cells that produce them. Their high expression levels may be due to the way they function, by establishing concentration gradients along which the responding cells migrate. The most recent human chemokine ligand to be reported (CXCL17, also called dendritic and monocyte chemokine-like protein, DMC) was found by fold-recognition methods [4].

Table 1 The chemokine superfamily

The members of the human and mouse chemokine superfamily are listed in Table 1, together with their receptors, and shown in schematic form in Figure 1; phylogenetic trees for the two superfamilies are shown in Figure 2. The two main chemokine ligand superfamiles are named according to the arrangement of the (typically four) cytokines within them: in the CC family, the first two cysteines near the amino terminus are adjacent, whereas in the CXC family there is one amino acid between them. The human molecules are represented using capital letters, whereas the mouse molecules use lower case, and an L or R is added to indicate ligand or receptor, respectively. For example, CCL5 is the human ortholog of a chemokine previously known as RANTES, Ccl5 is its mouse ortholog and CCR5 is a human receptor for several CCL ligands. Ligands encoded at a given chromosomal location, shown in the same color in Figure 1, usually bind the same receptor.

Figure 1
figure 1

A simplified diagram of the human chemokine superfamily, arranged by the receptors they bind to. Chemokines are represented by only their ligand number, and the receptor name also indicates whether each ligand is a CC or CXC; for example, the '6' adjacent to 'CXCR1' represents CXC6. The colors represent the chromosomal location of the ligands: the genes encoding the ligands shown in the same color are at the same chromosomal location. It can be seen that ligands whose genes are located in the same chromosomal location tend to bind to the same receptor. The extra lines attached to CXCL16 and CX3CL1 mean that these proteins exist as transmembrane proteins.

Figure 2
figure 2

Sequence relationship analysis of the human (h) and mouse (m) (a) chemokines and (b) chemokine receptors. Phylogenetic trees were constructed using amino acid sequences with Clustal X and PAUP* (the neighbor joining method) programs [37]. In (a), the GRO and IP10 groups of CXC chemokines and the MCP and MIP groups of CC chemokines (see also Figure 3) are circled. Red letters indicate proteins that are found in only mouse or human but not the other. Blue letters indicate proteins for which the relationships are uncertain.

Some chemokines are produced in very large amounts by many different cell types (for example, CCL2, CCL3 and CCL5), whereas others can have very high specificity for particular tissues or cell types, such as CCL25 (thymus and intestine), CCL27 (skin keratinocytes), CCL28 (certain mucosal epithelial cells) or CXCL17 (stomach and trachea). Other important aspects that differ between chemokines include their biological activities, the regulation of their expression, their receptor-binding specificities and the chromosomal locations of the genes that encode them. These features of the chemokine superfamily have been determined by the forces that have shaped their molecular evolution.

Linking the evolution and function of chemokines

Classification, clustering and gene duplication

The chemokines have been divided into two major groups based on their expression patterns and functions - a useful division, though oversimplified. Those that are expressed by cells of the immune system (leukocytes) or related cells (epithelial and endothelial cells, fibroblasts and so on) only upon activation belong to the 'inflammatory' class, whereas those that are expressed in discrete locations in the absence of apparent activating stimuli have been classified as 'homeostatic' (Table 1). The genomic organization of chemokines (Table 1, Figure 3) also enables us, however, to divide chemokines into two alternative groups: those whose genes are located in large clusters at particular chromosomal locations (the 'major-cluster' chemokines; Figure 3a) and the 'non-cluster' or 'mini-cluster' chemokines whose genes are located separately in unique chromosomal locations (Figure 3b,c) [2]. There are two major clusters of CC chemokine genes and two of CXC genes, plus numerous non-clustered or mini-cluster genes of both types, in both the mouse and human genomes (Figure 3).

Figure 3
figure 3

Schematic genomic organization of the human and mouse chemokine superfamily. (a) Major-cluster chemokines; (b) mini-cluster chemokines; (c) non-cluster chemokines. Solid arrows indicate chemokine genes and their transcriptional orientation; red, green and pink arrows indicate inflammatory, homeostatic and dual function chemokine genes, respectively, and gray arrows indicate pseudogenes. Duplication units in the major clusters are indicated by open yellow arrows. This figure is based on the NCBI 36 and 35 assemblies of the human and mouse genomes [38]. A gap indicates a region not yet covered by the genome sequencing consortiums, while a dashed line denotes a similar region of more than 1 Mb.

An explanation for this chromosomal arrangement is found in the evolutionary forces that have shaped the genome into gene superfamilies [5]. Over the course of evolution, gene duplication has been a common event, affecting most gene families [6]. Once a duplication occurs, the two copies can evolve independently and develop specialized functions. This explains the origin of the cluster chemokines, which show two other characteristics that do not apply to the non-cluster or mini-cluster chemokines: first, the members of a given gene cluster usually bind to multiple receptors and vice versa (the complex and promiscuous ligand-receptor relationships; Figure 1); and second, cluster chemokines often do not correspond well between species (for example, between human and mouse) [2].

These two characteristics can be explained as follows: the cluster chemokines and their receptors multiplied from their ancestral genes by a series of tandem gene-duplication events that occurred relatively recently in evolutionary terms, that is, even after the branching of human and mouse [2]. This is apparent from the phylogenetic tree shown in Figure 2, in which the cluster chemokines form compact clusters termed groups: the monocyte chemotactic protein (MCP) group, the macrophage inflammatory protein (MIP) group (both of CC chemokines), and the GRO group and the IP-10 group (both of CXC chemokines). This common evolutionary origin suggests that the cluster chemokines are a group of proteins sharing a common primary function. In the case of the chemokines encoded by the CXC GRO cluster on chromosome 4, which in human includes CXCL1-CXCL8, the primary function is the regulation of neutrophil recruitment to inflammatory sites [7]. The chemokines in this cluster do this through interaction with CXCR1 and CXCR2 (Table 1, Figure 1). Similarly, the main function of the cytokines encoded in the MIP and MCP clusters of CC chemokines in human chromosome 17, which includes CCL1-CCL16, CCL18 and CCL23, is the recruitment of monocytes, subsets of T cells, eosinophils, and so on, to sites where inflammation is developing, through their interaction with CCR1, CCR2, CCR3 and/or CCR5 (Table 1, Figure 1).

Functional reasons for clustering

An explanation for the large number of ligands for these receptors is that, during inflammation, multiple chemokines can be needed to induce a robust leukocyte response [2]. Furthermore, differential expression of these chemokines among different tissues may finely orchestrate the recruitment of leukocytes to the tissues and could enable a 'customization' of the inflammatory responses. Accordingly, most cluster chemokines belong to the inflammatory category [2].

Clustering and its consequences could provide a critical survival advantage to a species faced with a particular infectious agent. For example, CCR5 expression has recently been shown to be pivotal in resistance to infection with the West Nile virus in humans [8]. The protective mechanism of CCR5 may involve directing leukocytes to the brain, where they can fight the infection more effectively [9]. Another hypothesis, however, involves 'viral' chemokines, believed to be mammalian genes that were at some point 'hijacked' by viruses. To cope with the proliferation of such viral chemokines, mammals may have increased the numbers of their own endogenous chemokines to circumvent the effects of the viral molecules. For example, humans have CCL3L1 and CCL4L genes, which are homologs of CCL3 and CCL4 [10] and are found in a unit of zero to three copies depending on the individual (Figure 3a); CCL3L1 has an affinity for CCR5 ten times higher than that of CCL3 [11]. This higher affinity ligand would give an evolutionary advantage for an organism when coping with viral infections.

These hypotheses also explain the lack of correspondence between cluster chemokine ligands in mouse and human, which may reflect the 'infectious experience' of the two species after they separated. This effect is shown graphically in the separation of the human and mouse chemokine clusters in the phylogenetic tree shown in Figure 2: in the groups of chemokines there is often no one-to-one correspondence between human and mouse genes or the relationships between them may be uncertain. This evolution is ongoing, and it is therefore possible that variations in these genes will be documented even among relatively close species.

The only CC cluster chemokine that has a one-to-one ligand/receptor relationship (with CCR8) is CCL1 (Figure 1, Table 1). Its specific receptor, CCR8, is expressed by monocytes, activated helper Th2 cells and natural killer T cells, CD4+ thymocytes [12], regulatory T cells [13], normal skin-homing T cells [14], skin-homing γδ T cells and CD56+ CD16- natural killer cells [15]. The CCL1 gene is located in the MCP subregion (Figure 3a) but is rather distantly related to other members of the MCP group (Figure 2a), suggesting that it was generated much earlier than the rest of the cluster chemokines in this region. In fact, CCL1 may represent an early chemokine that branched before the CC cluster chemokines in the phylogenetic tree (Figure 2a). It is therefore possible that this chemokine-receptor pair has specific roles in shaping the immune system [16] and, in this context, its expression by T regulatory cells [13] is intriguing.

Non-cluster and mini-cluster chemokines

By contrast, the non-cluster or mini-cluster chemokines are relatively conserved between species and tend not to act on multiple receptors (Table 1, Figure 1). Indeed, several of these have a single ligand-receptor relationship, such as CCL25-CCR9 or CXCL13-CXCR5. The evolutionary model described above predicts that these particular chemokine ligand-receptor pairs probably have pivotal roles in the development of the organism or in the function of physiological systems necessary for the organism's survival to reproductive age (in other words, they are under evolutionary pressure). In support of this hypothesis, the genes for most homeostatic chemokines are found in non-cluster chromosomal locations (Table 1, Figure 3b,c). For example, CXCR4-deficient and CXCL12-deficient mice both have a lethal phenotype, and their embryos have various defects in critical organs, such as the heart, brain or bone marrow [17]. Therefore, throughout evolution, several non-cluster chemokines have participated in organogenesis, and their critical functions must be conserved in order for the species to survive. Another example is the CXCL13-CXCR5 pair, which is pivotal for successful B cell homing and, because it regulates T cell-B cell interactions, for the production of antibodies [18]. Thus, evolutionary pressure selects against changes in these genes by preventing them from diverging from their original function.

Early chemokines

In contrast to the cluster chemokines, the non-cluster and mini-cluster chemokines have been conserved throughout evolution and are therefore thought to be more 'ancestral' genes. This prediction is also supported by the phylogenetic tree shown in Figure 2, in which non-cluster and mini-cluster chemokines branch much earlier than the major-cluster chemokines and each human chemokine of this type has a clearly identifiable mouse counterpart [2]. There are data to support this model. Two groups have reported that, in the zebrafish, the CXCL12-CXCR4 pair regulates the homing of primordial germ cells to the gonads, where they differentiate into gametes [19, 20]. Importantly, the G-protein-coupled receptor Odysseus is readily recognizable as the zebrafish ortholog of CXCR4; 61% of the amino acid residues are identical between the zebrafish and human sequences (Figure 4). Similarly, the zebrafish ortholog of CXCL12 (with a remarkable 47% of residues in the coding region being identical; Figure 4) is also easy to identify.

Figure 4
figure 4

Chemokine and chemokine receptor sequences, such as (a) CXCR4, (b) CXCL12 and (c) CXCL8, are highly conserved throughout evolution, from jawless fish to humans. Identical amino acid residues are highlighted in green; the seven transmembrane regions of the receptors are indicated by black lines; the four conserved cysteine residues are indicated by dots above the sequences. Species abbreviations: dare, Danio rerio (zebrafish); pema, Petromyzon marinus (sea lamprey); lafl, Lampetra fluviatilis (European river lamprey). Accession numbers (from GenBank) are as follows: human CXCR4, NM_003467; zebrafish cxcr4b, NM_131834; sea lamprey cxcr4, AY178969; human CXCL12, NM_000609; zebrafish cxcl12a, NM_178307; zebrafish cxcl12b, NM_198068; human IL-8, NM_000584; river lamprey CXCL8, AJ231072.

The zebrafish genome contains many other chemokine genes, including those with the GenBank accession numbers NM131627 and NM131062[21], yet, in contrast to CXCL12, the correspondence of these molecules with human chemokines is not easy to establish. These observations underscore the importance of the CXCR4-CXCL12 pair throughout vertebrate evolution. GenBank now includes many chemokine gene entries from various genomes, including many mammals, shark, fish (including zebrafish) and even what may be homologs of chemokine receptor genes in Caenorhabditis elegans [22]. Another notable example is the chemokine LFCA-1 identified from the genome of the river lamprey (a jawless fish), which shows 46-49% identity to the chicken orthologs of CXCL8, K60 and 9E3 [23], and also has homology with human CXCL8 (Figure 4).

This interspecies genomic analysis will eventually help us understand the evolutionary history of the chemokine superfamily and may even allow us to identify a 'primordial' chemokine gene. It should be interesting to identify what the original function of this ancestral chemokine gene could have been. The function of the CXCR4-CXCL12 pair in the zebrafish in primordial germ cell homing suggests that chemokines and their receptors first arose as molecules controlling the transit of various cells within organisms simpler than mammals, and suggests that chemokines and their receptors have key roles in cellular transit in vivo during embryogenesis and/or in the adult organism. Another area of intense research is the function of chemokines in the development and function of the central nervous system [24]. This primary function in cellular traffic in vivo also supports a role for chemokines in cancer metastasis [25].

Recently, Balabanian et al. [26] reported the identification of a second human receptor (RDC-1) that binds CXCL12, the characterization of this receptor is ongoing, but it may also mind CXCL11. The sequence and characteristics of this receptor indicate that it belongs to the CXC receptor family and, as such, it should be named CXCR7. Its expression is more restricted than that of CXCR4, and it will be interesting to characterize its function in detail. RDC-1 may have another ligand [27], however, and it might, therefore, not be specific for CXCL12. Its capacity to bind CXCL12 suggests that it may represent another receptor (besides CXCR4) with important functions even in simpler organisms.

Mini-cluster chemokines and gene translocations

The evolution of the chemokines is an ongoing process, and there are examples of ligands forming 'mini-clusters' as well as major clusters (Figure 2b). One of these includes the CXCL9, CXCL10 and CXCL11 genes, which are located in the CXC IP-10 inflammatory cluster (4q21.21). The chemokines they encode function in T-cell recruitment through CXCR3 [28] and also in the negative control of angiogenesis through CXCR3B, an alternatively spliced variant of CXCR3 [29]. Another mini-cluster includes CCL19 and CCL21, which are located in close proximity (9p13 in human) and whose encoded chemokines share a receptor, CCR7. Likewise, human CCL17 and CCL22 are located in close proximity (16q13 in human) and their chemokines share a receptor (CCR4). Interestingly, another protein encoded in the same mini-cluster as CCL17 and CCL22, CX3CL1 (previously called fractalkine) is totally different from them: it is a trans-membrane-type chemokine with the CX3C motif (two cysteines separated by three amino acids) instead of the CC motif and interacts specifically with CX3CR1 (Figure 1, Table 1). The position of CX3CL1 is probably due to its translocation from elsewhere to between CCL17 and CCL22 (Figure 3b).

Another example of a translocation is CCL27, which maps in close vicinity to CCL19 and CCL21 (Figure 3b) but does not share CCR7 with the encoded chemokines (Table 1). Instead, CCL27 is most similar to CCL28, and they share CCR10 (Table 1). Thus, it is possible that CCL27 was originally located in chromosome 5p12 and may have translocated to its present site. Alternatively, the location of the CCL27 gene could be explained by the fact that the gene for the 3 chain of the interleukin 11 receptor is located on this site but in opposite orientation [30], indicating that this locus has been subjected to multiple evolutionary forces. Further evidence that chemokine evolution is ongoing is provided by XCL1 and XCL2 (previously called lymphotactin), which are the result of a recent gene duplication as they only differ by one amino acid [31] and they share the receptor XCR1 [32] (Figure 3b, Table 1). Another example (in the mouse) is Ccl21, which is encoded by three different genes that differ in one amino acid codon and are expressed in distinct anatomical locations [33].

Of mice and men

The mouse is generally considered a valuable model for human diseases. The completion of the mouse genome supports this view, because it seems to be remarkably similar to the human genome [34]. Analysis of the human and mouse genomes has revealed that the genes involved in immune and host defense roles are under positive selection pressure, accumulating amino acid changes more rapidly than other genes. Chemokines are listed as one of the eight most rapidly changing proteins and domains [35]. Examination of the gene organization of human and mouse chemokine clusters also shows great divergence (Figure 3) [36]. The following are three important differences.

First, some chemokine genes exist in one species but not the other. This is the most dramatic example of lack of correlation between species and applies specifically to the inflammatory/cluster chemokines. Table 1 and Figure 3a show that, in the CXC subfamily, CXCL8 does not have a mouse counterpart, whereas Cxcl15 exists in the mouse but not in human. Among the CC subfamily (Figure 3b), CCL13 and CCL14 exist in the human but not in the mouse. Alternatively, a given gene in one species (for example, CCL16 and CCL18) may be represented by a pseudogene in the other.

Second, a given chemokine may be related to (or represented) by more than one ortholog in the other species (Table 1). This is due to independent duplication events that have occurred in one of the species. Human XCL1 and XCL2 and the varying number copies of human CCL3 and CCL4 and of mouse Ccl27, Ccl19 and Ccl21 described above are examples of this.

Third, there can be similar genes in the two species but they may not be 'exact' structural or functional equivalents. One of the best examples of the latter is the MCP group. Structurally, it is difficult to assign a human counterpart unambiguously to each mouse gene, because they are all closely related molecules that probably arose independently in each species (Figure 2a).

Differences like these may result in important differences in the function of chemokines between species. These potential differences do not, however, exclude the mouse as a valid model for human disease. But they do mean that there are limitations to the extrapolations we can make when using mouse models to understand human disease. It is worth emphasizing that these differences may be particularly important in studies of inflammatory diseases, which involve the inflammatory chemokines (most of which are major-cluster cytokines), and less so in experiments designed to understand the function of homeostatic chemokines, which, because they are generally noncluster cytokines and thus more conserved between species, should be more readily applicable to the human system.

The progress in the discovery and characterization of chemokines has been remarkable, and we are approaching the completion of the discovery phase of many other molecular superfamilies. The sudden availability of so many new molecules is an excellent opportunity for understanding the roles of chemokines, not only in the immune system, but also in development and general physiology. Analysis of the syntenic genomic regions between mouse and human has enabled investigation of the relationships between the chemokines of these species. The mouse is a popular model for investigating gene function, but it is important that the significant differences in the chemokine ligand superfamily between mouse and human are taken into account, especially as the ability to extrapolate mouse data to human disease depends on the gene under study. This type of analysis should be applicable to other molecular superfamilies. It is our hope that the issues we have discussed here will facilitate understanding of the biology of the chemokine superfamily.