Immunogenetics

, Volume 55, Issue 8, pp 570–581

Genomic analysis of immunity in a Urochordate and the emergence of the vertebrate immune system: “waiting for Godot”

Authors

  • Kaoru Azumi
    • Department of Biochemistry, Graduate School of Pharmaceutical SciencesHokkaido University
  • Rosaria De Santis
    • Stazione Zoologica “Anton Dohrn”
  • Anthony De Tomaso
    • Department of PathologyStanford University School of Medicine
  • Isidore Rigoutsos
    • Bioinformatics and Pattern Discovery GroupIBM Thomas J. Watson Research Center
  • Fumiko Yoshizaki
    • Department of Biological Sciences, Graduate School of ScienceUniversity of Tokyo
  • Maria Rosaria Pinto
    • Stazione Zoologica “Anton Dohrn”
  • Rita Marino
    • Stazione Zoologica “Anton Dohrn”
  • Kazuhito Shida
    • Center for Interdisciplinary ResearchTohoku University
  • Makoto Ikeda
    • Center for Interdisciplinary ResearchTohoku University
  • Masami Ikeda
    • Department of Electronic Information System Engineering, Faculty of Science and TechnologyHirosaki University
  • Masafumi Arai
    • Department of Electronic Information System Engineering, Faculty of Science and TechnologyHirosaki University
  • Yasuhito Inoue
    • Department of Electronic Information System Engineering, Faculty of Science and TechnologyHirosaki University
  • Toshio Shimizu
    • Department of Electronic Information System Engineering, Faculty of Science and TechnologyHirosaki University
  • Nori Satoh
    • Department of Zoology, Graduate School of ScienceKyoto University
  • Daniel S. Rokhsar
    • U.S. Department of Energy Joint Genome Institute
  • Louis Du Pasquier
    • Department of ZoologyUniversity of Basel
  • Masanori Kasahara
    • Department of Biosystems Science, School of Advanced SciencesThe Graduate University for Advanced Studies (Sokendai)
  • Masanobu Satake
    • Department of Molecular Immunology, Institute of Development, Aging and CancerTohoku University
    • Department of Biological Sciences, Graduate School of ScienceUniversity of Tokyo
Original Paper

DOI: 10.1007/s00251-003-0606-5

Cite this article as:
Azumi, K., De Santis, R., De Tomaso, A. et al. Immunogenetics (2003) 55: 570. doi:10.1007/s00251-003-0606-5

Abstract

Genome-wide sequence analysis in the invertebrate chordate, Ciona intestinalis, has provided a comprehensive picture of immune-related genes in an organism that occupies a key phylogenetic position in vertebrate evolution. The pivotal genes for adaptive immunity, such as the major histocompatibility complex (MHC) class I and II genes, T-cell receptors, or dimeric immunoglobulin molecules, have not been identified in the Ciona genome. Many genes involved in innate immunity have been identified, including complement components, Toll-like receptors, and the genes involved in intracellular signal transduction of immune responses, and show both expansion and unexpected diversity in comparison with the vertebrates. In addition, a number of genes were identified which predicted integral membrane proteins with extracellular C-type lectin or immunoglobulin domains and intracellular immunoreceptor tyrosine-based inhibitory motifs (ITIMs) and immunoreceptor tyrosine-based activation motifs (ITAMs) (plus their associated signal transduction molecules), suggesting that activating and inhibitory receptors have an MHC-independent function and an early evolutionary origin. A crucial component of vertebrate adaptive immunity is somatic diversification, and the recombination activating genes (RAG) and activation-induced cytidine deaminase (AID) genes responsible for the Generation of diversity are not present in Ciona. However, there are key V regions, the essential feature of an immunoglobulin superfamily VC1-like core, and possible proto-MHC regions scattered throughout the genome waiting for Godot.

Keywords

Genome analysisImmunological genesUrochordateEvolution

Introduction

A remarkable event in the evolution of the vertebrate immune system was the seemingly abrupt emergence of adaptive immunity at the jawed vertebrate stage. While every major component of the mammalian adaptive immune system, such as the MHC, TCR, and Ig molecules, as well as the genes directly involved in somatic generation of diversity, such as RAG1 and 2, has been identified in the cartilaginous fish, all attempts to isolate these genes from more distant organisms to bony vertebrate than cartilaginous vertebrate (i.e., the jawless fish and below) have not been successful (Flajnik and Kasahara 2001).

While some recent results hint at the presence of an ancestral adaptive immune system, such as the presence of lymphocyte-like cells and orthologues to mammalian lymphocyte-specific genes reported in lamprey (Mayer et al. 2002; Uinuk-Ool et al. 2002), and a possible proto-MHC region identified in amphioxus (Abi-Rached et al. 2002), direct predecessors of any component of adaptive immunity have never been identified. However, these results are based on techniques such as low-stringency cloning which are not infallible. A systematic search for adaptive immune-related genes or their harbingers has not yet been possible in jawless vertebrates or invertebrate deuterostomes, due to the absence of a complete genome sequence of a suitable organism.

Unlike adaptive immunity, innate immunity is much more ancient in origin and some innate immune-related genes, such as Toll-like receptors (TLR) and their downstream signal transduction molecules, are shared by vertebrates and the Drosophila protostome (Hoffmann et al. 1999). However, some components of innate immunity are considered to be specific to deuterostomes or chordates. For example, complement factor B (Bf) has been found in sea urchin and ascidians but not in the Drosophila genome, suggesting that Bf was an innovation within the deuterostome or chordate lineages (Nonaka 2001). Understanding the evolutionary history of the innate immune system in the chordates has also been hampered by the lack of complete genome sequence of an appropriate model organism.

Annotation of the recently sequenced Ciona intestinalis genome (Dehal et al. 2002) has allowed an overview of the immune system in an organism that occupies a key phylogenetic position in vertebrate evolution. Ciona belongs to the Urochordata, one of the three subphyla of the phylum Chordata. The remaining supbphyla, Vertebrata and Cephalochordata, are considered to be sister groups (Cameron et al. 2000; Wada and Satoh 1994); thus a comparison between mammalian and Urochordata genomes is particularly relevant for the elucidation of the ancestral genome of Chordata. In particular, the Urochordata genome is expected to provide information on the ancestral state prior to the two rounds of complete genome duplication postulated to have occurred at the early stage of vertebrate evolution (Ohno 1999).

Materials and methods

The draft genome sequence of C. intestinalis has been determined recently by a whole-genome shotgun approach (Dehal et al. 2002), and updated assemblies are available at http://www.jgi.doe.gov/ciona. Several of the authors of this manuscript participated in the annotation workshop that was held at the U.S. Department of Energy Joint Genome Institute (DOE-JGI) in April 2002. In the time since then, gene product searches and annotation were performed with the help of a web-based annotation and maintenance system that DOE-JGI made available to the participants. The actual sequences corresponding to the gene models reported in this paper are found in http://genome.jgi-psf.org/ciona4/ciona4.home.html as follows: (1) go to the “search” page, (2) enter the gene model name (such as grail.1.1.1) into the box marked ‘Name equals’, (3) click the ID number, and (4) click the “Get Sequence” tab.

For several aspects of the above-described analysis, we made use of a pattern-based method that determined and reported putative C. intestinalis members for various families of immunological interest (Rigoutsos and Floratos 1998; Rigoutsos et al. 1999). The method has several stages and can be briefly summarized as follows: starting with a collection of known members of a family F, we generate a collection of patterns that are specific to the family’s members, subselect only those that are statistically significant and use them to scan the proteome of C. intestinalis in order to identify those sequences that contain one or more of these family-specific patterns. A subsequent iterative procedure confirms, refines and further augments the set of results. A detailed description of this method is presented as a supplementary material.

A computer-assisted search of motif-bearing, transmembrane proteins consisted of several steps as follows. First, candidates for the transmembrane (TM) proteins were extracted from the proteome by the SOSUI program (Mitaku et al. 1999) (5,505 grails were extracted from the total 26,759 grails). Second, the signal sequence, if present, was deleted from each TM protein sequence after detected by the DetecSig program (Lao 2001). Third, the ConPred program (Ikeda et al. 2002) was used to predict the number of TM segments and the topology (amino-terminal of polypeptide, outside or inside of the cell) of each grail. At this step, the 1,745 sequences having no predicted TM segment were segregated as soluble proteins and the remaining 3,760 grails (14.1% of total grails) were predicted to be TM proteins. Finally, the TM proteins, which have motifs characteristic for immunity-related proteins, were extracted.

Results

No evidence for adaptive immunity

Somatically diversified receptors

The vertebrate antigen-specific receptor comprises Ig superfamily (Igsf) variable and constant domains. The variable domain genes are inherited as segments that have to be rearranged somatically in the lymphocyte lineage to create functional V genes. The constant domains belong to the C1 type that differs structurally from the C2 and I-set type. Unlike C2 and I-set domains, C1 domains are found only in a restricted number of molecules such as MHC class I and II, tapasin, SIRPs (signal regulatory protein), Ig, and TCR, and, to date, have been identified only in jawed vertebrates (Du Pasquier 2000). A search of the Ciona genome provided no evidence for the existence of Ig or TCR that are defined by segmental V domain and C1 domain, or of RAG1/2 homologues. Additionally, no activation-induced cytidine deaminase (AID) homologue was found; the entire Apo bec gene family to which AID belongs seems to be missing from Ciona. With respect to enzymes involved with somatic modifications of antigen receptor genes in vertebrates, terminal deoxynucleotidyl transferase (TdT) is the only one with a relatively good homologue in Ciona. TdT cDNA have been found in Ciona egg and larval libraries (Ciona intestinalis cDNA project; http://ghost.zool.kyoto-u.ac.jp/indexrl.html). Consequently, there is no evidence for somatic diversification of any Ciona V gene, and if it were to exist, it would be due to mechanisms unlike those found in the adaptive immune receptors of jawed vertebrates.

MHC class I and II, and genes involved in antigen processing and presentation

MHC class I or II genes were not found by either Blast searches (Altschul et al. 1990) or by a more sensitive pattern discovery-based method (see Materials and methods). Many Ciona genes were predicted to possess Ig domains, including a few genes with C1-like domains. However, the peptide-binding domain, another constituent of the MHC class I and II molecules, was not identified in Ciona. This result suggests that this domain, and consequently the MHC class I and II molecules, are recent innovations in the vertebrate lineage. Alternatively, the MHC peptide-binding domain may have evolved too fast to be detected by sequence homology between species that diverged more than 550 million years ago.

The potential for antigen presentation can also be assayed, in part, by analyzing the proteasome subunit genes, a necessary component of antigen processing. The proteasome is a cytoplasmic protease complex essential to all eukaryotic cells. Its central part, the 20S proteasome, is composed of 14 different constitutively expressed subunits. During immune reactions in jawed vertebrates, three additional subunits inducible by interferon-γ (IFN-γ) displace their constitutive counterparts and change the specificity of the proteasome so that it produces class I-binding peptides more efficiently (Rock and Goldberg 1999). Although all 20S proteasome subunits are structurally related, the extent of similarity between the three IFN-γ-inducible and corresponding constitutive subunits is much higher, thus suggesting that the inducible subunits arose by recent gene duplication from their constitutive counterparts. The Ciona genome contains a total of 14 proteasome subunit genes, predicted by 12 complete gene models and two truncated models (Fig. S1, Electronic Supplementary Material). Even for the truncated models, there exist corresponding expressed sequence tags (ESTs), which indicates that all 14 genes are transcribed. The three Ciona genes, which occur in both constitutive and inducible forms in jawed vertebrates, show closer sequence similarity to the constitutive forms. However, a detailed phylogenetic analysis indicated that they qualify as orthologues of the constitutive and inducible subunits. Since the constitutive subunit genes serve as a control, this result strongly suggests the absence of inducible subunit genes in the Ciona genome, rather than a failure on our part to detect them. This is also true for the proteasome activator gene family PA28. Although mammals have one constitutive PA28γ and IFN-γ-inducible PA28α and PA28β genes, only one expressed gene, thought to be an orthologue of the three mammalian genes, was identified in Ciona. Two other genes showing closer similarity to PA28γ are present in Ciona, but they have no corresponding EST. These results indicate the absence of IFN-γ-inducible PA28 genes in this genome. In addition, Ciona contains no genes that code for TAP1and TAP2, although it contains a vertebrate ABCB9 (TAPL) orthologue (data not shown). Similarly, it contains no genes for tapasin, the invariant chain, or the MHC class II transactivator. Taken together, these results strongly indicate the absence of the MHC-based antigen processing and/or presenting system in Ciona.

Innate immunity

Complement system

The mammalian complement system consists of about 35 serum and cell surface proteins, and plays a pivotal role in host defense against infection (Volanakis 1998). Several complement genes have been isolated from invertebrate deuterostomes such as sea urchin, ascidian and amphioxus, indicating that the complement system preceded adaptive immunity (Nonaka 2001). In the ascidian complement system, the following genes have been identified: C3, Bf, mannose-binding protein (MBP)-associated serine proteases (MASPs), ficolins, as well as the alpha chain of the complement receptor CR3. However, the complete scope of the ascidian complement system remains to be clarified. We report here the result of a systematic search for the possible complement genes in the Ciona genome, mainly based on domain structures (Fig. 1).
Fig. 1

Possible complement genes identified in the Ciona genome. Using an ordinary Blast, as well as the more sensitive method described in Materials and methods, we searched the gene models of C. intestinalis for genes with significant homology to vertebrate complement genes. Each candidate thus identified was manually examined for its domain structure, and only those genes with the same or nearly the same domain structures as those found in the vertebrate counterparts were counted. The left half of the figure shows the schematic domain structures of complement genes, and the right half the copy numbers in Ciona and human. The identities of domains are explained at the bottom

Before the initiation of the complement activation, the host molecules must recognize invading pathogens. The mammalian complement system has three kinds of initiation molecules: C1q, MBP, and the recently discovered ficolins which use the C1q domain, the C-type lectin domain and the fibrinogen domain, respectively, for recognition/binding. They all contain the collagen domain, which binds serine proteases (C1r, C1s or MASPs responsible for activation of the classical and lectin pathways). The Ciona genome contains nine MBP, nine ficolin and two C1q genes, and most genes are transcriptionally active as shown by the presence of ESTs. Some of these genes are duplicated in tandem, whereas some EST clones suggest the possibility of alternative splicing. MBP and ficolin genes show a higher copy number in Ciona than in mammals, suggesting more extensive gene duplications in the ascidian lineage (Figs. S2 and S3, Electronic Supplementary Material).

The central component of the mammalian complement system, C3, has two homologues, C4 and C5, generated by gene duplication. In addition, a serum-protease inhibitor, alpha-2-macroglobulin (α2M) shows a weak but significant similarity to C3/C4/C5; the latter is believed to have originated from α2M by gene duplication (Sottrup-Jensen et al. 1985). However, the exact timing of these gene duplication events has yet to be determined. The Ciona genome codes for two C3-like genes (Marino et al. 2002) and two α2M genes. An ancestor of the two C3-like genes seems to have diverged from a common ancestor of vertebrate C3/C4/C5 and has duplicated into two genes in the Ciona lineage (Fig. S4, Electronic Supplementary Material).

Mammalian Bf and C2 are catalytic subunits of the C3 convertases of the alternative and classical pathways, respectively; they are gene duplication products which share the same characteristic domain structure. In Ciona, there exist three Bf-like genes that are supported by cDNA evidence. The Bf-like genes in Ciona are longer than those of jawed vertebrates by virtue of having one additional short consensus repeat (SCR) domain and two low-density lipoprotein receptor (LDLR) domains. Phylogenetic analysis suggests that they have diverged from the common ancestor of vertebrate Bf and C2, prior to the divergence of Bf and C2 (Fig. S5, Electronic Supplementary Material).

MASP/C1r/C1s are proteolytic enzymes responsible for the activation of the mammalian complement system through the lectin or classical pathway. They share a unique domain structure that is not found in any other proteins. In H. sapiens, at least six protein forms are known and are derived from four genes. The Ciona genome contains four such genes and phylogenetic analysis indicates that they are derived directly from the common ancestor of vertebrate MASP/C1r/C1s and duplicated before the divergence of Ciona and another ascidian Halocynthia roretzi (Fig. S6, Electronic Supplementary Material).

The mammalian complement late components form the membrane attack complex (MAC) on the pathogen cell membrane that leads to cytolysis. These MAC components, together with perforin (effector molecules released by cytotoxic T and natural-killer cells) share a unique domain known as the MAC/perforin domain. We have found 11 gene models with the MAC/perforin domain, and nine of them have a unique domain structure similar to that of human late components. Thus the domain structure of these molecules is TSP, TSP, LDLR, MAC/perforin, and EGF from the N terminus. It is notable that with the exception of one, all C6-like genes have corresponding EST sequences. Phylogenetic analysis indicates that the C6-like genes have expanded in the ascidian lineage (Fig. S7, Electronic Supplementary Material).

The mammalian complement system includes regulatory components that inhibit undesirable complement activation against one’s own cells. Most mammalian complement regulators are composed of repeats of the SCR domain. The Ciona genome contains 132 gene models with SCR domains (Table 1); the cut-off threshold for this prediction was an E value <10−7, and 95 of these models have SCR homology at E<10−10. As such, many of these genes are expected to be actual complement regulators.
Table 1

Domain-based comparative protein analysis in humans (H), Ciona intestinalis (C), fruit fly (F), worm (W), yeast (Y) and thale cress (T). The number of C. intestinalis genes with respective domains was estimated by computer searches with the pattern discovery method (see Materials and methods), followed by manual examination (NA not available)

pdm

pfam

Accession no.

Domain name

C

C

H

F

W

Y

T

1

PF00129

MHC_I

0

0

18

0

0

0

0

2

PF00993

MHC_ll_alpha

0

0

5

0

0

0

0

3

PF00969

MHC_II_beta

0

0

7

0

0

0

0

4

PF00277

SAA_proteins

0

0

4

0

0

0

0

5

PF00323

Defensins

0

0

2

0

0

0

0

6

PF00711

Defensin_beta

0

0

1

0

0

0

0

7

PF00666

Cathelicidins

0

0

2

0

0

0

0

8

PF01109

GM_CSF

0

0

1

0

0

0

0

9

PF00143

Interferon

0

0

7

0

0

0

0

10

PF00714

IFN-gamma

0

0

1

0

0

0

0

11

PF00340

IL1

0

0

7

0

0

0

0

12

PF00715

IL2

0

0

1

0

0

0

0

13

PF02059

IL3

0

0

1

0

0

0

0

14

PF00727

IL4

0

0

1

0

0

0

0

15

PF02025

IL5

0

0

1

0

0

0

0

16

PF00489

IL6

0

0

2

0

0

0

0

17

PF01415

IL7

0

0

1

0

0

0

0

18

PF00048

IL8

0

0

32

0

0

0

0

19

PF00726

IL10

0

0

1

0

0

0

0

20

PF02372

IL15

0

0

1

0

0

0

0

21

PF01291

LIF_OSM

0

0

2

0

0

0

0

22

PF01821

ANATO

2

0

6

0

0

0

0

23

PF00386

C1q

3

2

24

0

0

0

0

24

PF00354

Pentaxin

2

1

9

0

0

0

0

25

PF01823

MACPF

13a

13a

6

0

0

0

0

26

PF00229

TNF

1

1

12

0

0

0

0

27

PF00047

Ig

107

59

381

125

67

0

0

28

PF00147

Fibrinogen_C

76

70

26

10

6

0

0

29

PF00084

Sushi

132

87

53

11

8

0

0

30

PF00059

Lectin_c

120

82

47

23

91

0

0

31

PF00057

Ldl_recept_a

118

82

35

33

27

0

0

32

PF00431

CUB

105

88

47

9

43

0

0

33

PF01839

FGGAP

15

13

NA

NA

2

1

NA

34

PF00362

Integrin_B

6

5

8

2

2

0

0

35

PF01582

TIR

3

2

18

8

2

0

131

36

PF00092

Vwa

114

80

34

0

17

0

1

37

PF00008

EGF

139

164

108

45

54

0

1

aTwo of 13 MACPF models are incomplete, and are not counted in Fig. 1 and Fig. S7, Electronic Supplementary Material. ANATO domains in Ciona were found in fibulin, not in the C3a portion of the C3 genes

Toll-like receptors

The mammalian genome encodes several TLRs, with each TLR responsible for detecting corresponding pathogen-associated molecular patterns (Medzhitov and Janeway 2000). The Ciona genome has only three TLR genes, characterized by the extracellular leucine-rich repeat (LRR) motif and the intracellular Toll/IL-1R (TIR) domain (Table S1, Electronic Supplementary Material). The genes involved in the TLR signaling pathway were also identified, including MyD88, characterized by the TIR and Death domains, IRAK (IL-1 receptor-associated kinase), TRAF (TNF receptor-associated factor), NFκB and IκB (Fig. 2). Mouse RP105 protein is an atypical member of the mammalian TLR family as it possesses only multiple LRR motifs and no TIR domain (Miyake et al. 1995). Ten gene models with domain architecture similar to that of RP105 were identified. As LRR is a motif that also functions in protein-protein interactions and is involved in cell-cell communication, it is conceivable that some of the LRR-containing Ciona genes actually encode cell-adhesion molecules and not pathogen-recognizers.
Fig. 2

Innate immunity-related, transmembrane proteins and signaling molecules identified in the Ciona genome. The genes were placed in the figure according to the presumed cellular localization of their protein products. Arrows connecting individual components were placed according to the assumption that Ciona and mammals share basic signal transduction pathways. It must be noted that, although the figure is illustrated as if all the genes listed are expressed in a single coelomocyte, expression profiles are not known for most genes. The identities of effector genes involved in cell proliferation and differentiation, inflammation and anti-microbial effects are not known. The names of each gene and domain are shown in black and blue, respectively. The number of gene models identified is indicated in parentheses for each gene. See the text for abbreviations

Cytokines

The Ciona genome contains three IL-1R (IL-1 receptor)-like genes with an extracellular Ig and an intracellular TIR domain (Table S1, Electronic Supplementary Material). However, the direct homologue of mammalian IL-1 was not detected in the Ciona genome. As for other cytokines and their receptors, one TNF (tumor necrosis factor)-like and three TNFR (tumor necrosis factor receptor)-like genes were identified (Terajima et al. 2003). The signaling pathways mediated by TLR, IL-1R and TNFR share some components such as MyD88, IRAK and TRAF. It is possible that these signaling cascades are indeed functioning in Ciona coelomocytes (Fig. 2). With the exception of TNF, all cytokine-coding genes appear to be missing from the Ciona genome.

Immunoglobulin superfamily molecules

Igsf domains are found in many invertebrate proteins, some of which are involved in immunity. Among the genes with V domains, Ciona has the homologues of the amphioxus VCBP, a small multigene family whose two N-terminal V domains are linked to a chitin-binding domain (Cannon et al. 2002) (Fig. 3). Although it is not clear yet whether VCBP is involved in host defense mechanisms, the existence of germ line diversity in amphioxus suggests its role in non-self recognition. In contrast, no strict homologues of the invertebrate Igsf members hemolin (Sun et al. 1990) or the mollusk defense molecule (MDM) (Hoek et al. 1996) were found in the Ciona genome.
Fig. 3

Ciona Igsf members of immunological interest. Domain structures of the Ciona molecules with Ig domain are schematically shown. Three Ig domains, V, C1-like (C1?) and C2 are shown in yellow, green and blue, respectively. Other domains shown in different colors are as follows: CBD chitin-binding domain, LR leucine-rich, CR cysteine-rich, PEROX peroxidase, TM transmembrane, CY cytoplasmic. Names of each gene are presented at the left, with its copy number in the Ciona genome in parentheses. Gene name abbreviations are as follows: VCBP V region-containing chitin-binding proteins, JAM junction adhesion molecule, CTX cortical thymocyte marker of Xenopus, PVR poliovirus receptor

The role of peroxidasin in innate immunity is related to its ability to generate reactive oxygen. It does not contain any V domain among its four Igsf elements. A good homologue of the Drosophila and human peroxidasin was identified in Ciona (Fig. 3).

Molecules involved in phagocytosis and pathogen recognition

Phagocytic activity of coelomocytes or blood cells plays a very important role in innate immunity of invertebrates and vertebrates. It has been observed that (1) ascidian coelomocytes undergo both opsonin-dependent and -independent phagocytosis, (2) soluble C-type lectins and C3 in hemolymph function as opsonins in ascidian, (3) superoxide anions are produced in ascidian phagocytic cells (Azumi et al. 2002; Nonaka et al. 1999; Sekine et al. 2001). The Ciona genome has six gene models of integrin α and five gene models of integrin β. In Halocynthia, CR3 composed of integrin α and β chains has been shown to play a critical role in C3-dependent phagocytosis (Miyazawa et al. 2001).

C-type lectins (CTLs) recognize carbohydrate derived from pathogens in a calcium-dependent manner, and participate in phagocytosis of microorganisms (Ezekowitz and Stahl 1988). Table S2, Electronic Supplementary Material, lists 16 gene models with (an) extracellular CTL domain(s) and a transmembrane segment; roughly half of these models showed significant homology to the mammalian macrophage mannose receptor, which binds to the mannose-rich glycoproteins of pathogens and mediates their opsonin-independent phagocytosis. The Ciona genome also contains three gene models that encode homologues of mammalian CD36 (E<10−70). In mammals, CD36 is a scavenger receptor of type B that recognizes lipids, including oxidized low-density lipoprotein, as well as various microbe proteins, and is capable of mediating phagocytosis of apoptotic cells (Ren et al. 1995). CD36 is also a candidate for an opsonin-independent phagocytosis receptor of Ciona. These observations suggest that Ciona coelomocytes are equipped with the phagocytosis machinery analogous to the one in mammals (Fig. 2).

With respect to secreted or cytoplasmic pathogen-recognizing proteins, we identified one Gram-negative binding protein (GNBP) and one lipopolysaccharide-binding protein (LBP) models, known to be involved in innate immunity in mammals and insects (Lee et al. 1996). However, a homologue of the peptidoglycan recognition protein (PGRP) was not identified in Ciona.

Molecules involved in inflammatory reactions

A systematic search of the Ciona genome revealed no direct homologues of mammalian chemokines, interferons or their receptors. The presence of four TNF-like genes does suggest, however, that Ciona has the ability to mount inflammatory reactions. In mammals, inflammation is initiated by the hydrolysis of arachidonyl phospholipids by phospholipase A2 and the subsequent release of arachidonic acid. Prostaglandins and leukotrienes, which are responsible for various effects seen in inflammation, are subsequently synthesized from arachidonic acid by cyclooxygenases and lipoxygenases. The Ciona genome contains several gene models believed to encode the homologues of mammalian phospholipase A2, cyclooxygenases and lipoxygenases.

Among the effector molecules involved in inflammation are anti-microbial peptides. A search of the Ciona genome failed to identify the homologues of anti-microbial peptides such as defensin and cecropin (Hoffmann and Reichhart 2002).

Other signal transduction pathways

We also performed the Blast search of the Ciona genome using as probes signaling molecules and transcription factors which were not described above and which have important functions in mammalian immunity and/or hematopoiesis. Tables S3 and S4 in the Electronic Supplementary Material summarize the Ciona versions of signaling molecules and transcription factors, respectively. In general, most of the signaling molecules and transcription factors appear to be shared between Ciona and mammals. It is notable that the number of members identified in Ciona for each transcription factor or signaling molecule category is approximately one third to one fourth of that in mammals.

Prelude to adaptive immunity

Activating and inhibitory receptors

The vertebrate immune response is modulated by the integration of opposing signals generated by activating and inhibitory receptors (Ravetch and Lanier 2000). Many of these receptors are found on natural killer (NK) cells, which function at the interface of innate and adaptive immunity. Modulation is achieved by mediating or suppressing ligand-induced signaling through an immunoreceptor tyrosine-based activation motif (ITAM) or an immunoreceptor tyrosine-based inhibitory motif (ITIM), respectively (Ravetch and Lanier 2000; Reth 1989). The majority of these receptors belong to either the C-type lectin (CTL) or immunoglobulin superfamilies.

Activating and inhibitory receptors often occur in pairs with slightly different characteristics. While inhibitory receptors are identified by their intracellular ITIM signaling motif, activating receptors usually do not contain any signaling domains, rather they pair with a common adaptor molecule (e.g., DAP12), which contains the ITAM sequence. The assembly of the activating receptor and signaling adaptor molecule is mediated by characteristic charged residues within the transmembrane domains; the activating receptor contains a positively charged residue while the adaptor contains a negatively charged residue.

A survey of the Ciona genome (Table S5, Electronic Supplementary Material) revealed a large number of gene models which predicted type I and type II single-pass membrane proteins containing extracellular CTL or Ig domains coupled to intracellular ITIM motifs. Only one membrane protein was found with an ITAM motif; however, all the gene models that did not have an intracellular ITIM did contain a characteristic positive charge in the transmembrane region. While direct ancestors of vertebrate receptor families (e.g., KIRs, NKG2) were not identified, there were many intriguing homologies, including one molecule with strong overlap to DC-SIGN with an intracellular ITIM (Table S5, Electronic Supplementary Material). The Ciona genome also contains gene models capable of encoding ZAP-70 and syk, as well as SHP-1, which interact with the ITAM and ITIM motif, respectively (Fig. 2; Table S3, Electronic Supplementary Material). In addition, genes that encode phospholipase C gamma, calcineurin and the transcription factor NFAT (nuclear factor of activated T cells) were also identified (Table S4, Electronic Supplementary Material). Thus, in addition to the TLR/IL-1R pathway, Ciona coelomocytes are most likely equipped with a calcium-dependent signaling pathway regulated through the ITAM/ITIM motif.

Another characteristic of the vertebrate activating and inhibitory receptors is their genomic organization into clusters of closely linked genes, called the NKC (with CTL superfamily members) and LRC (containing Ig superfamily members) (Diefenbach et al. 2000). For CTL family members, this organization has been found to be conserved in the bony fish (A. Sato and J. Klein, personal communication). The Ciona genome did not show any large clusters of potential receptors, however, several pairs of homologous genes were found, suggesting recent duplication events (Table S5, Electronic Supplementary Material). This included one set of what could be an inhibitory (grail 61.29.1, which contains an ITIM) and activating (grail 61.22.1, with a positive charge within the TM segment; Table S5, Electronic Supplementary Material) receptors. In summary, analysis of the Ciona genome reveals that ITIM- and ITAM-bearing receptors likely had an early evolutionary origin, and more importantly, suggests that both activating and inhibitory components may have ligands which are not homologues to vertebrate MHC molecules.

Consistent with this, an ITAM-containing receptor involved in phagocytosis and inflammation has been reported in another ascidian, Halocynthia roretzi (Takahashi et al. 1997).

Lymphocyte CD markers

The only Igsf CD marker that has been unambiguously located in Ciona is a CD166 homologue, which may be involved in various physiological processes, including thymus development, neural cell migration and osteogenesis (Fig. 3). However, CD6, which binds to CD166 or CD5, has not been found in the Ciona genome. A systematic search with the Igsf CD markers expressed specifically in leukocytes revealed no clear counterparts in Ciona. Genes closely related to CD8 or CD4 have not been found. However, some non-Ig members expressed in mammalian hematopoietic tissues (although not specifically in lymphocytes) have homologues in Ciona with remarkably well-conserved structures (Table S6, Electronic Supplementary Material).

Igsf members related to mammalian receptors with V-C2 or V-C1 architecture

Among the molecules having this type of receptor architecture, Ciona has molecules related to two families of mammalian adhesion molecules: the CTX/JAM family (V-C2) and the nectin family (V-C1-like-C2) (Fig. 3). Both families have members expressed in the lymphocytes or hematopoietic cells of vertebrates although their expression is not restricted to the former.

A Ciona gene codes for a member of the CTX/JAM family (V-C2-TM-CY domains) (Chretien et al. 1998). The sequence of the predicted Ciona molecule is more similar to the JAM type (the adhesion molecule type) rather than to the CTX type, which is expressed on the surface of amphibian and bird thymocytes. However, the Ciona JAM-like molecule has also some of the CTX characteristics, i.e., the extra disulfide bridge in the C domain (missing in vertebrate JAM members). Hence, the Ciona sequence presumably corresponds to a gene that existed prior to the divergence into the CTX and JAM types.

Two Ciona genes with strong similarity to nectins have been found (Fig. 3). Earlier comparisons and analysis of the nectins and poliovirus receptors (PVR) had shown that their first constant domain belonged to the same category as that of tapasin; i.e., it was a recognized C1 domain (Du Pasquier 2000). Therefore, the nectin family is a likely candidate for the precursor of antigen receptors and tapasin (also harboring a V-C1 pair), and may have donated the Ig-like domain of MHC class I and II molecules. Interestingly, the JAM, CTX, PVR, and PVR-related genes are linked in mammals and have several paralogues on human chromosomes 3, 11, 1, 19 and 12 (Du Pasquier, unpublished). In addition to the CTX/JAM and nectin families, a homologue of CD166 (V-V-C2-C2-C2) was also identified in Ciona (Fig. 3).

A pre-duplicated form of the MHC precursor region in the Ciona genome

In the human genome, closely linked sets of paralogous genes often occur on more than two, and typically four, distinct chromosomal segments (Ohno 1999). The MHC is one of the best-characterized regions which exhibit genome paralogy (Flajnik and Kasahara 2001). Of the more than 100 genes mapped to the human MHC (located on 6p21.3), nearly 40 have one to three paralogous copies on specific regions of chromosomes 1, 9, and 19. Previous studies support the thesis that the MHC and the paralogous regions on chromosomes 1, 9, and 19 arose as a result of two rounds of block (or genome-wide) duplication that took place before the emergence of jawed vertebrates, but after the emergence of cephalochordates (Abi-Rached et al. 2002; Flajnik and Kasahara 2001). Hence, the Ciona genome is likely to contain a region that can be regarded as a common ancestor of the MHC and the three paralogous regions.

To test this possibility, we carried out Blast searches of the Ciona genome using human genes with multiple copies in the four MHC paralogous regions as queries. This analysis showed that the found genes typically occur in a single copy in Ciona, consistent with a ‘one-to-four’ rule (see Fig. S9, Electronic Supplementary Material, for phylogenetic analyses of representative genes). Thus, it appears that the Ciona genome has not experienced the two rounds of duplication which gave rise to the MHC and its three paralogous regions.

We next performed a detailed genome map analysis of the scaffolds picked up by the Blast searches described above. The results summarized in Fig. 4 and Fig. S8, Electronic Supplementary Material, show that there is a strong tendency for the Ciona genes (whose human counterparts map to the MHC and its three paralogous regions) to occur adjacent to each other. These observations suggest that paralogous copies on the four MHC paralogous regions most likely arose by block duplication of gene clusters (see Fig. S9, Electronic Supplementary Material, for phylogenetic analyses of representative genes). Furthermore, some Ciona genes are linked to one another in a manner similar to their human counterparts that map to the MHC (e.g., BAT5-like and BAT2-like genes in scaffold 114, and NOTCH-like and tenascin-like genes in scaffold 366) and their amphioxus counterparts located in a proto-MHC (e.g., WDR5-like and BRD-like in scaffold 151). While the absence of the information on the relative relationship of individual scaffolds precludes us from drawing definitive conclusions, the available evidence is consistent with the hypothesis that Ciona has a region (or multiple fragmented regions derived therefrom) that qualifies as a common ancestor of the MHC and the three paralogous regions.
Fig. 4

Representative scaffolds indicating that the Ciona genome contains a pre-duplicated form of an MHC precursor-like region. Genes are color-coded according to the following criteria: red, corresponding human genes have paralogous copies in the MHC and at least one of the regions paralogous to the MHC; yellow, corresponding human genes map to the MHC or its vicinity, and apparently have no paralogous copies elsewhere; green, corresponding human genes have no copies in the MHC, but share paralogous copies among at least two of the regions paralogous to the MHC; blue, corresponding human genes map to one of the regions paralogous to the MHC; black, corresponding human genes map outside the MHC or the regions paralogous to the MHC; and white, genes or gene fragments with no apparent similarity to vertebrate genes or those that show higher sequence similarity to non-vertebrate genes. Chromosomal localization of corresponding human genes is given in parentheses following gene names. The regions of the human genome, which have been established to be paralogous to the MHC, are 9q32-q34 (extending to 9p13-p24/9q21-q22), 19p13.1-p13.3, and 1q21-q25/1p11-p32 (extending to 1q31-32/1p34-p36). Other regions of the human genome proposed to be paralogous to the MHC, but not defined as the regions paralogous to the MHC in Fig. 4, or Fig. S8, Electronic Supplementary Material, include 12p11-p13, 7q22, 15q21-q26, and 21q22.3. Note that several Ciona genes in Fig. 4 and Fig. S8, Electronic Supplementary Material, have corresponding human genes on these regions

Discussion

The draft genome sequence of C. intestinalis has provided the first opportunity to compare the genome of an invertebrate deuterostome with those of some vertebrates and protostomes. The comparison between the higher vertebrate and Urochordate genomes is particularly interesting for immunity-related genes, since the immune system is believed to have experienced a drastic change by the emergence of adaptive immunity at the jawed vertebrate stage. None of the genes that play pivotal roles in adaptive immunity, such as Ig, TCR, MHC class I and II, RAG1 and 2, and AID, were located in the Ciona genome. As more than 95% of the Ciona genome is covered by the assembly that was used for this analysis (Dehal et al. 2002), we conclude that most of these genes are indeed absent from the Ciona genome and have not escaped detection. This result therefore provides the first definitive evidence that the origin of adaptive immunity can be traced back to the interval between the emergence of urochordates and of jawed vertebrates.

The genes involved in innate immunity in deuterostome are classified into two major groups, those shared by protostomes and deuterostomes and those found only in deuterostomes (or chordates). The presence of the former group genes in the Ciona genome is not surprising as they are present in human, fruit fly and worm genomes. Typical examples for this group are TLR and those genes involved in its signal transduction. Only three TLR genes were found in Ciona, indicating less expansion in comparison to humans or the fruit fly that each have about ten TLR genes. It should be noted, however, that in the fruit fly, only one of these genes has clear immunological function whereas the remaining ones are mainly developmental genes. It is thus likely that the expansion of the innate immune recognition repertory through duplication of TLR genes is a strategy adopted only by higher vertebrates.

Many domains used by vertebrate immunity-related genes are also shared by deuterostomes and protostomes. To elucidate the phylogenetic origins of the protein domains found in vertebrate immunity-related molecules, we compared the number of each domain encoded by the genomes of the following organisms: human (H), Ciona (C), fruit fly (F), worm (W), yeast (Y) and thale cress (T) (Venter et al. 2001) (Table 1). As shown in this table, domains no.27–37 are shared by deuterostomes and protostomes. It is interesting to note that most of the vertebrate complement domains are members of this group. However, the specific combinations of domains uniquely found in vertebrates are apparently absent in protostomes. The second group of domains (no. 22–26) includes those shared only by the human and Ciona genomes. This group of domains seems to be either deuterostome- or chordate-specific. The two complement domains C1q and MACPF belong to this group. Therefore, the deuterostome-specific part of the innate immune system such as the complement system seems to have been established not only by the reshuffling of preexisting domains but also by the innovation of new domains. The no.1–21 domains are found only in the human genome. This group of domains can be regarded as vertebrate-specific and includes MHC class I and II, small anti-microbial peptides, and cytokines. These domains most probably appeared in the jawed vertebrate lineage simultaneously with the emergence of adaptive immunity. The presence of C1q in Ciona was totally unexpected, since mammalian C1q mainly binds to Ig. However, human C1q also binds to C-reactive protein (CRP), a member of the pentaxin family. Because Ciona has pentaxins (Table 1), pentaxin might have been the original partner of C1q. Although this possibility needs to be confirmed experimentally, it is interesting to note that both C1q and pentaxin domains occur only in a deuterostome or chordate lineage (Table 1).

Even though none of the essential genes of the adaptive immune system have been found in the Ciona genome, several genes related to the antigen receptor architecture were detected (Fig. 3). In most cases, only one or two copies of such genes have been found. Structurally, they resemble non-somatically generated TCR or Ig polypeptides. They have V domains, and some elements even have C1-related domains, which is specific to the molecules involved in the Gnathostome immune system. From the available cDNA data we deduce that they are not expressed specifically in Ciona coelomocytes. Their vertebrate relatives are also not lymphocyte-specific except in a few cases such as CTX. The interesting aspect of the PVR/CTX/JAM family in mammals is that all of the members function as virus receptors. If this is a conserved function of these proteins, it could have provided the impetus to co-opt them into the immune system. The mode of interaction between a virus and the receptor has been partially elucidated for JAM (Barton et al. 2001), and the mechanisms involved may provide insight into the origin of the specific antigen receptor. The binding of the virus to the JAM V region alone causes internalization of the virus, and the binding of the virus to the V region of JAM plus the binding of the virus to sialic acid creates enough of a signal to activate NFκB and lead to apoptosis, reminiscent of costimulation. This could represent a pathway for the Igsf members to enter the immune system prior to establishment of a rearranging mechanism. Given the homology of the V-C1 of the PVR molecules with the V-C1 of tapasin and tapasin-related molecules (Du Pasquier 2000), it even provides an explanation to the origin of the Ig C1 of bona fide MHC I and II molecules. The PVR molecules are not present in protostomes and as such they represent a step in the evolution of the deuterostome lineage. Their linkage to the NFκB activation and apoptosis (at least for the JAM representatives) in mammals provides a link to conserved innate immune pathways and hints at some continuity in the evolution of immune systems.

In that same theme, the Ciona genome contains a number of potential inhibitory and activating receptors, and their cognate signal transduction molecules, without any identifiable MHC genes. In contrast, the majority of ligands for these receptors in the vertebrates are either MHC or MHC related molecules, either by sequence, or potentially, structure (Arase et al. 2002; Diefenbach et al. 2000). While no direct predecessors of any vertebrate receptor family have been identified, and it would not be surprising that signaling pathways would be co-opted during evolution, these results do suggest that this activation/inhibition system had an early evolutionary origin with respect to immunity. This supports the emerging idea that cells that express these receptors, such as NK cells, have an important and perhaps unexpected role in vertebrate immunity (Cerwenka and Lanier 2001).

In conclusion, the search and analysis for immunity-related genes in Ciona suggest the presence of a well and uniquely developed innate immune system in urochordates. Although some possible precursors of the jawed vertebrate adaptive immune system were identified, they appear to still be distant from functional adaptive immunity equipped with somatic mechanisms for generation of diversity. The evolutionary process leading to the emergence of adaptive immunity will be further clarified by genome analyses of cephalochordates and cyclostomes.

Supplementary material

Methods

supp_meth.pdf (36 kb)
(PDF 32 KB)

Tables S1-6

supp_tabl_1-6.pdf (52 kb)
(PDF 53 KB)

Figure Legends S1-9

supp_fig_leg.pdf (55 kb)
(PDF 60 KB)

Figures S1-7

supp_fig_1-7.pdf (29 kb)
(PDF 30 KB)

Figures S8-9

supp_fig_8_9.pdf (48 kb)
(PDF 50 KB)

Copyright information

© Springer-Verlag 2003