Journal of Molecular Evolution

, Volume 63, Issue 5, pp 691–706 | Cite as

Ancient Phylogenetic Beginnings of Immunoglobulin Hypermutation

  • Jaroslav Kubrycht
  • Karel Sigler
  • Michal Růžička
  • Pavel Souček
  • Jiří Borecký
  • Petr Ježek
Article

Abstract

Many structures and molecules closely related to those involved in the specific process of immunoglobulin (Ig) hypermutation existed before the appearance of primordial Ig genes. Consequently, these structures can be found even in animals and organisms distinct from vertebrates; likewise, homologues of hypermutation enzymes are present in a broad range of species, from bacteria to mammals. Our analysis, based predominantly on primary structure, demonstrates the existence of molecules similar to Ig domains, variable Ig domains (IGv), and antigen receptors (AR) in unicellular organisms, nonvertebrate metazoans, and nonvertebrate Coelomata, respectively. In addition, we deal here with some important structural properties of CDR1-like segments of the selected sponge adhesion molecule GCSAMS exhibiting chimerical Ig domain similarities, and demonstrate the occurrence of conserved regions corresponding to Ohno’s modern intact primordial building block in the C-terminal part of IGv-related segments of nonvertebrate origin. The results of our analysis are also discussed with respect to the possible phylogeny of molecules preceding the hypothetical common antigen receptor ancestor.

Keywords

BLAST CDR1 Domain similarity Geodia cydonium Hypermutation Immunoglobulin Primordial building block Protein kinase substrate Template 

Introduction

Immunoglobulin (Ig)1 hypermutation represents a selective type of site specific enormously frequent somatic mutation occurring almost exclusively in B cells (Gordon et al. 2003). This process is related to hypermutation of T-cell receptors in T cells (Oprea and Kepler 1999) and participates, among others, in the maturation of human antibodies. It also initiates isotype switch of human Ig genes (Banerjee et al. 2002; Poltoratsky et al. 2004) and gene conversion of corresponding genes of chicken origin (Di Noia and Neuberger 2004).

Two highly conserved families, i.e., APOBEC family and Y family of DNA polymerases (Wedekind et al. 2003; Yang 2005), include important mutator enzymes involved in Ig hypermutation. The members of these families can usually be characterized by RPS BLAST similarities with sequences of three conserved domains (cd012083 and the closely related pair, pfam00817 and COG0389) which otherwise determine superior superfamily relationships in a broad range of organisms, from bacteria (Brotcorne-Lannoye et al. 1986; Yang et al. 1992; Reuven et al. 1999; Bjedov et al. 2003) to vertebrates. Activation-induced cytidine deaminase (AID), a member of the APOBEC family (indicated by cd01283), represents a proposed main DNA mutator of Ig genes (Muramatsu et al. 2000; Petersen-Mahrt et al. 2002; Beale et al. 2004). Surprisingly, this enzyme or its familiar homologues have not yet been detected in model Urochordata genomes of genus Ciona (Azumi et al. 2004; also in accordance with our recent BLAST searches including associated domain projections). AID can also trigger accompanying Ig hypermutation effects of error prone-enzymes of the Y family of DNA polymerases (Poltoratsky et al. 2004), mentioned above. Three members of this Y family of DNA polymerases (i.e., catalytic subunits of DNA polymerases η and ι and possible recognition subunit Rev1 of polymerase ζ complex) were found to participate in the corresponding error-prone replication (Faili et al. 2002; Simpson and Sale 2003; Zeng et al. 2004). The subunits of mutator enzymes involved in Ig hypermutation can also be important for some steps of cancerogenesis (Simhadri et al. 2002; Diaz et al. 2003; Okazaki et al. 2003; Washington et al. 2003; Faili et al. 2004; Lossos et al. 2004; Malpeli et al. 2004) and possibly also in recombination (Bradshaw et al. 2002), both concerning genes different from Ig ones. In agreement with these less specific reactions and preceding familial/superfamilial relationships, the enzymes involved in Ig hypermutation possibly evolved from ancestors of different function. On the other hand, the ancestor’s relationship to more frequently hypermutating shorter segments such as CDR1 and CDR2 remains unclear.

In contrast to the well-defined conserved domain relationship of hypermutation enzymes described above, very little is known about the possible phylogenetic relationship to their DNA “macrosubstrates.” Corresponding hypermutating DNA regions (HDR) of about 1.1-kb lengths (Rada et al. 1997) include not only exon sequences of Ig genes but also intron ones involved in class switch (Muramatsu et al. 2000). Though members of the immunoglobulin superfamily were most frequently detected in similar HDR, such regions also contain segments encoding members of different protein superfamilies (Morelli et al. 2002; Gordon et al. 2003). Such HDR were also found in DNA extracted from tumor cells of various cell lineage different from B cells, i.e., cells of melanoma, mesenthelioma, and carcinoma origin (Morelli et al. 2002). Recent data suggest that AID, which is predominantly instrumental in the Ig-related hypermutation of short tetranucleotides (“microsubstrates”) of general motif structure DGYW/WRCH (Rogozin and Diaz 2004 binds to “G-loops” within transcribed S regions (Duquette et al. 2005). Consequently, the last interaction may restrict the occurrences of at least some macrosubstrate HDR.

In spite of the scant knowledge about the evolution of HDR, the evolution of Ig superfamily members and Ig domains (most frequently associated with HDR) has been investigated for a long time (Williams and Barclay 1988; Halaby and Mornon 1998; Gordon et al. 2003). This also holds for the evolution of Ig genes. We may distinguish four eras in the evolution of Ig genes important also for their hypermutation phylogeny: (i) prehistory (more than 1000 to 500 million years ago [MYA]) (Marchalonis et al. 2002; Du Pasquier et al. 2004), (ii) beginnings of evolution of primordial antigen receptors (AR) (Litman et al. 1999; Lee et al. 2002; Suzuki et al. 2004) (we consider a period of at least 500–480 MYA), (iii) “big bang” period in the evolution of Ig germline gene reshuffling and rearrangement (Bernstein et al. 1996a; Marchalonis and Schluter 1998; Schatz 2004) (480–450 MYA), (iv) period of gradual evolution of these processes (Sitnikova and Su 1998) (450 MYA–present) and appearance of hypermutation-dependent isotype switch (Zarrin et al. 2004).

The data presented here concern the “prehistory” period of Ig gene evolution. Like several other studies, this paper includes the sequence study of molecules closely related to the immediate phylogenetic (“prehistorical”) precursors of the hypothetical common ancestor of antigen receptors (PPCAR) possibly encoded by rearranging genes (Litman et al. 1999; Du Pasquier et al. 2004; Pancer et al. 2004; van den Berg TK et al. 2004). Such PPCAR perhaps participated in recognition of viruses (Du Pasquier et al. 2004). Nevertheless, a portion of this paper reports a sequence study of phylogenetically more distant nonvertebrate molecules containing conserved domain similarities to variable Ig domains (IGv) or otherwise markedly similar to AR. This means that we also look for molecules exhibiting a possible structural relationship to earlier PPCAR variants of considered involvement in cell adhesion or clearance processes (Tilson and Rzhetsky 2000; Marchalonis et al. 2002). In consequence of this and our previous studies (Kubrycht et al. 2002, 2004), we also demonstrate the existence of two interesting regions of different Ig domain location in selected molecules. In addition, some examples of bacterial and even archaeal Ig or Ig-like domains are also mentioned.

Materials and Methods

Programs, Algorithms, and Formulas

General remarks on sequence alignments, limits, and standardization of database searches

For addresses of employed on-line programs and additional comments on the standardization of included BLAST procedures, see the web page ( http://www.papersatellitesjk.com), chapter WP1. Except for this web page, BLAST searches and multiple sequence alignments (MSA) necessary for compilation of our tables were performed repeatedly from June 1, 2004, to September 30, 2005. No Entrez restrictions achieved a bit score lower than the standardized inferior bit score limits of BLAST searches (WP1.2). Chain identities of tridecanucleotide segments were passed through the standard Expect limit value of 10 in our oligonucleotide BLASTN scanning (see also the following paragraph and Fig. 1), whereas one nucleotide shorter dodecanucleotides were not usable in such searches. Except for abundant typical bacterial Ig domains and redundant alternative splicing forms, the total list of molecules selected by our search procedures is displayed in the tables. For more detailed description of these procedures see below. For several MSA- and phylogram-related procedures see chapter WP4.
Fig. 1.

Absence of mRNA identities with oligonucleotides of GCSAMS(cdr1.1.com). a We show here bit profiles indicating GCSAMS sites of presence (higher values) or absence (lower values) of 100% sequence identities with human and mouse mRNA oligonucleotides of inferior possible lengths (IROIL). These profiles concern GCSAMS segment GF (GP: N132–203; see also WP2.1) and both its more extended envelope regions described in Restriction of the Segment of Oligonucleotide Dissimilarities Between the Investigated Sequence and the Compared Sequence Set (Materials and Methods). Thirteen- and fourteen-nucleotide-long (13N, 14N) oligonucleotides represent IROIL detectable by BLASTN searches in the corresponding available sequence sets (see also General Remarks on Sequence Alignments, Limits, and Standardization of Database Searches [Materials and Methods]). Displayed IROIL also include decomposed similarities of greater length. GP(5′)—positions of 5′ ends of correlated IROIL; h13, m13, and hm13—13N IROIL with human, mouse, and both human and mouse sequence sets, respectively; h14, m14, and hm14—corresponding 14N IROIL; DIF1p—profile including DIF1, a segment of major occurrence of oligonucleotide mRNA dissimilarities (ORD). DIF1 is determined by the simple selection algorithm described in Restriction of the Segment of Oligonucleotide Dissimilarities Between the Investigated Sequence and the Compared Sequence Set (Materials and Methods). b In accordance with our approach to oligonucleotide overlaps (see also the algorithm mentioned above), two pairs of envelope regions (ER) were used in our statistical evaluation of IROIL occurrence, i.e., ER of GF (GP: N126–209, N125–210) and ER of GCSAMS(cdr1.1.com) (GP: N168–194, N167–195). ER of GCSAMS(cdr1.1.com) are closely related to the position of the resulting DIF1 (GP: N166–192). Both human and mouse 13 N ORD are present only in selected DIF1 (p < 0.001; chi-square for 2 × 2 tables) of position closely related to the CDR1-like segment GCSAMS(cdr1.1.com). Similarly, the presence of hm14 in DIF1 is significantly increased (p < 0.001) with respect to both GF and ER of GF. For further comments and additional statistical evaluation see The Segment Almost Identical to CDR1-like Segment GCSAMS(cdr1.1.com) Exhibits the Least Similarity to Mammalian mRNA Sequences Within the Preformed GF region (Results).

Restriction of the segment of oligonucleotide dissimilarities between the investigated sequence and the compared sequence set

The segment of major occurrence of oligonucleotide dissimilarities (DIF1) was restricted by the occurrence of DIF1OL oligonucleotides, using the algorithm: DIF1OL = (h(N;0) OR m(N;0)) AND hm(N + 1;0), where m, h, and hm indicate the origin of compared sets, i.e., human, mouse, and both mRNA sequences, respectively; N is the minimal length of oligonucleotide identities found by (standard) BLASTN; and 0 denotes oligonucleotides without the corresponding oligonucleotide identity. In our approach, the segments extended by at least half of N and N + 1 length overlaps are correlated with DIF1, which usually determines either a single or a pair of N and N + 1 related envelope regions (ER).

Amino acids (aa) of substantially different column occurrences

Let X and Z be any different aa occurring in the same sequence block column more frequently than follows from considered random probabilities where at least one of them is less frequent than the usual or reevaluated high occurrence aa (for details see WP 3.3 and 3.4.4).

Assumption 1: Let column occurrence of X determine more than one model random event (i.e., one model aa) higher model (reference) twin (sequence block) column (MTC; for details see WP5.4) than the Z occurrence does. This means that LEX − LEZ > 1, where LEX, LEZ are length equivalents of X and Z in A given column, respectively (see also WP3.3 and Kubrycht et al. 2002).

Assumption 2: Similarly, let the ratio of X and Z probabilities (relative X probability) determine a probability lower than random occurrence of X, i.e., the corresponding MTC is higher than one in accordance with the formula: log(cX/cZ) : log(aX) > 1, where cX,, cZ are the given aa probabilities of X and Z in a given column, and aX is the considered aa probability (for topical values see Kubrycht et al. 2002).

Definitions: Provided that the same X and Z are in agreement with both assumptions, we say that X is an aa of superior occurrence (SO aa), whereas Z is an aa of inferior occurrence. In the cases of other relationships between X and Z, both X and Z are named residual alternative aa (RA aa). For the purpose of these definitions see Formation of Hybrid Templates, below.

Statistical tools

Chi-square evaluation for 2 × 2 tables, its modification corrected to continuum, odds ratio calculation, and two-tailed test were used in our statistical evaluations (Lepš 1996; Zvárová 2001).

Preselection and Selection Procedures Involved in the Restriction of Displayed Sequence Sets

Restriction of sequence sets presented in Tables 1 and 2

Preselection procedures included different forms of searches: (i) several BLAST search strategies described in our previous paper (Kubrycht et al. 2004) or employing domain sequences as queries, (ii) Boolean searches in the NCBI Protein Database, and (iii) PUBMED search resulting in reviews including extended lists of molecules (Halaby and Mornon 1998; Teichmann and Chothia 2000; Vogel et al. 2003; Du Pasquier et al. 2004). RPS BLAST detecting conserved domain similarities was used in the subsequent (final) step in this combined approach. Molecules containing segments similar to typical bacterial Ig-like (Big and BID domains) or metazoan IG (smart00409 or cd00096) were searched in unicellular organisms, whereas IGv similarities selected the molecules of metazoan origin.
Table 1.

NCBI conserved domain search for Ig and Ig-like domains in Archaea and Bacteriaa,b

Proteinc name

Taxonomy

Conserved domain similaritiesd

GI

Species

Group

Max

Ig-related

Additional

Archaea

      

  hyp P AF2119

11499702

A. fulgidus

Euryarchaeota

BID_1

BID_1,Big_1

  hyp P MM1983

21228085

M. mazei

ditto

COG3291

Big_1,BID_1

COG3291,PKD

Bacteria

      

  bac SPC IGL

27366598

V. vulnificus

Proteobacteria

Big_2

Big_2

  Intimin

7384863

E. coli

ditto

Big_1

Big_1,BID_1,Big_2,BID_2

  Invasin

55977770

Y. pseudotub.

ditto

Big_1

Big_1,BID_1

  bac IGL P

41815928

T. denticola

Spirochetes

Big_2

Big_2

COG5492

  IglRDP1

45656363

L. interrogans

ditto

Big_2

Big_2,BID_2

  ATS1

53729990

D. aromatica

Proteobacteria

COG5184

IG4,IGL

COG5184

  CP-933P

15801953

E. coli

ditto

IGcam

IGcam,IG9,IGL,IG4,Igc2

  COG3210

48854852

Cy. hutchinsoni

Bacteroidetes

IG4

IG4,IGc2,IGL,IGcam,IG9

  endogluc C

121819

Ce. fimi

Actinobacteria

GH_9

IGcam,IG4,IGL

GH_9,CelD,CBM

  ZP_00561298

68209254

De. hafniense

Firmicutes

LytB

IGc2,IGcam,IG9,IGL IG4

CW_bin2,FN3

a The table reflects the very early origin of Ig and Ig-like domains. The presence of typical metazoan Ig domains in bacteria (the bottom group of displayed bacterial proteins) may follow from their very ancient origin or from lateral (horizontal) transfer of genes. For details, additional comments, and abbreviations, see the first and second sections of Results and both Methods and Discussion, respectively, and chapter WP5.

b A combined approach selecting the sequence sets displayed in this table and in Table 2 is described in Restriction of Sequence Sets Presented in Tables 1 and 2 (Materials and Methods). Some of these procedures were additionally standardized according to the limits described in WP1.2.

c Abbreviations related to selected molecules and species: ATS1, α-tubulin suppressor and related RCC1 domain-containing proteins; bac, bacterial; endogluc C, endoglucanase C; hyp, hypothetical (predicted); IGL, Ig-like domain (for details see footnote d); IglRDP1, Ig-like repeat domain protein 1; P, protein; SPC, surface protein containing; pseudotub., pseudotuberculosis.

d Only the first appearance of each domain type is presented in the column dealing with bit score derived orders of similarities. Maximal domain similarities of selected molecules are present in the column max. Ig and Ig-like domains: BID_1, BID_2, big_1, big_2-typical bacterial Ig-like domains; COG5492—more distinct bacterial Ig-like domains; IG4, IG9, IGc2—typical Ig domains present in Igs; IG4—smart00409; IG9—cd00096; IGcam-Ig domains of metazoan cell adhesion molecules; IGL— metazoan Ig-like domains different from preceding Ig domains. Shortened standard domain abbreviations: CBM—carbohydrate binding domain CBM_4_9; GH_9—glycosyl hydrolase family 9; CW_bin2-CW_binding_ 2, a putative cell wall binding repeat.

Table 2.

Similarities of conserved variable Ig domains in nonvertebrate Eukaryotaa

Proteinb name

Taxonomyc

Conserved domain similaritiesd

GI

Species

Group

Max

IGv-related

Additional

Porifera

      

  GCSAMLd1**

5730148

G. cydonium

D

IG4

IG4, IGL, IGcam, ig, IGv4, IG9, IGc2, IGv9

Second Ig domain

  GCSAMSd1**

5730150

ditto

D

IG4

IG4, IGL, IGcam, IGv4, ig, IG9, IGc2, IGv9

Second Ig domain

  GCSAMLd2*

5730148

ditto

D

IG4

IGv4, IG4, IGcam, IGL

First Ig domain

  GCSAMSd2*

5730150

ditto

D

IG4

IG4, IGv4, IGcam, IGL

ditto

  RTK*

2285928

ditto

D

TyrKc

IG4, IGv4, IGcam, IGL, IG9, ig

TyrKc, Pkinase, STKc, SPS1

  SRTK*

1103393

ditto

D

TyrKc

IG4, IGv4, IGcam, IGL

ditto

Nematoda

      

  RIG-3

7497864

C. elegans

Ch

IGL

IG4, IGL, IGv9

IGL, IG4, IGcam

Arthropoda

      

  CG2198-PA

17136222

D. melanog.

I

IGcam

IG4, IGcam, IGL, IGv4, ig

IGcam, IGc2, IG4, IGL, ig

  CG5308-PA

28571650

ditto

I

IG4

IG4, IGL, IGcam, IGv9, IGC2

IG4, IGL, IGcam, IGc2, IG9, ig

  CG6867-PA*

45549584

ditto

I

OLF

IGcam, IGL, IG4, IGc2, IG9, ig, IGv4

OLF,IG4,collagen,IGL, IGcam, IGc2

  CG10095-PA

24646195

ditto

I

IGL

IGL, IG4, IGv4, ig, IG9, IGcam

  CG12191-PA

19923026

ditto

I

IG4

IG4, IGL, IGcam, IGc2, IGv9

IG4

  CG12591-PA

24644123

ditto

I

IGL

IGL, IG4, IGcam, IG9, IGv4, IGc2

  CG13439-PA

61678347

ditto

I

IG4

IG4, IGL, IGcam, IGv9, IGc2

IG4, IGL, IGcam, IG9, IGc2, ig

  CG14162d1

24662101

ditto

I

IG4

IG4, IGcam, IGL, IGc2, IGv9

Second Ig domain

  CG14162d2*

24662101

ditto

I

IG4

IG4, IGL, IGc2, IGcam, IG9, ig, IGv4, IGv9

First Ig domain

  CG14372-PA

24646583

ditto

I

IGc2

IG4, IGL, IGv4

IGc2, IG9, IGL, IGcam

  CG14469-PA*

24585904

ditto

I

IGL

IGL, IG4, IGcam, IGc2, IG9, ig IGv9

  CG31062-PA

24650621

ditto

I

IG4

IGL, IGv4, IG4

IG4, IGc2, IGcam, IGL, ig, FN3

  CG31361-PB

24646192

ditto

I

IGcam

IGcam, IGL, IG4, IG9, IGc2, ig, IGv4

IG4, IGL

  Turtle p*

14149050

ditto

I

IGcam

IGv4, IG4

IGcam, IG, IGL, IGc2, FN3, fn3, ig

  EN12008

31243175

A. gambiae

I

IG4

IG4, IGcam, IGL, IGv9, IGc2

IGcam, IG4, IGc2, IGL, ig, IG9

Mollusca

      

  ror*

13430037

A. califor.

G

TyrKc

IGL, IGcam, IG4, IGc2, IG9, ig, IGv9

TyrKc, Pkinase, STKc, SPS1, KR, Fz

Nonvertebrate Chordata

      

  NICIR*

53147480

E. burgeri

H

IGv4

IGv4

  VCBP5*

24571192

B. floridae

C

CBM_14

IGv4, IG4, IGv9

CBM_14, ChtBD2, IGL, IG4

  VDB**

34101043

B. lanceol.

C

IGv4

IGv4, IGL, IG4, ig, IGv9, IG9

aThe majority of nonvertebrate chimerical IGv-related conserved Ig domain similarities (CICIDS) does not exhibit superior conserved domain similarity to selecting IGv domains. The IGv-related similarity of the highest score concerned VDB (for details see IGv-Related Molecules in Metazoans [Results]).

bOnly sponge (Geodia cydonium) molecules of reciprocal sequence similarities <90% are displayed here. For additional information see Table 1, Gv-Related Molecules in Metazoans (Results), and chapter WP5. Unusual abbreviations: d1 and d2 associated with names—the first and second Ig domains from the N-terminus of the selected molecules, respectively (CG14162-PAd2 is abbreviated CG14612d2); EN—ENSANGP000000; NICIR—novel ITAM-containing Ig superfamily receptor; turtle p—an example of several isoforms of turtle proteins; VCBP5—variable region-containing chitin-binding protein 5.

PSI-BLAST *iteration or **iterations with one or both IGv as query sequences (performed on the list of the displayed molecules) selects or select the segment(s) collocating with the corresponding IGv positions, respectively.

cAbbreviations of taxonomic groups: califor., californica; intestin., intestinalis; lancelol., lancelolatum; melanog., melanogaster; C, Cephalochrodata; Ch, Chromadorea; D, Demospongiae; H, Hyperotreti; I, Insecta; G, Gastropoda.

dEach domain with detectable CICIDS is described separately, i.e., the same molecules can repeat here. Arrangement of the domains and usage of the “max” column follow the same rules as in Table 1. Boldface in max column—maximal domain similarity of the given molecule is a part of CICIDS. IGv4 and IGv9—variable (IGv) domains smart00406 and cd00099 (which is more related to Igs and T-cell receptors), respectively.

Knowledge-based approach complementary to the results of the searches described in the preceding section

Three conditions were required in our selection: (i) the similarity with any reference sequence (RfS; for details see Reference Sequences (RfS) Used in the Combined Search Related to Table 3, below) is not lower than Smin = 22.3 bits, when using BLASTP searches; (ii) at least one segment of conserved IG or IGv domain similarity exists; and (iii) at least one maximal score value of BLASTP similarities to Igs or T-cell receptor sequences achieves 40.0 bits. Whole RfS were compared here as query sequences. Entrez restriction strategy followed from the obligatory list of taxonomic terms: Metazoa[ORGN] NOT (Caenorhabditis[ORGN] OR Arthropoda[ORGN] OR Vertebrata[ORGN]).
Table 3.

Some antigen receptor-related proteins in nonvertebrate Coelomataa,b

Protein name

Taxonomy

Dominant domain similaritiesb,c

RfSd

Ig/Tb

Superior vertebratea

GI

Species

Groupa

BLP/TBLN scorea

Protostomia

       

  apCAM

283583

A. californica

M

IGL*, IGcam, FN3*

N4, N9, T

46/37

226 (NCAM-140)

  leechCAM

2275262

H. medicinalis

A

IGcam*, IG4*, FN3*, IGc2

N9, S, T

40/36

274 (sim NCAM2)

  RTPh

2695655

ditto

A

PTPc*, FN3*, IGcam*

T

43/32

1041 (RTPhdelta)

  Tractin

2275260

ditto

A

IGcam*, IG4, FN3*, IGL, fn3

IW, N4, S, T, Vp

48/38

444 (KIAA0756)

  MDM

1373431

L. stagnalis

M

IGcam*

N9

47/37

100 (L1 CAM)

  ror

13430037

ditto

M

TyrKc, KR, IGL, Fz

T

36/44

671 (ror1)

Nonvertebrate Chordata

       

  COS2.1

1518213

C. intestinalis

U

IG4, IGcam

T

42/35

88 (hemicentin1)

  FGFR

11037736

Ha. roretzi

U

TyrKc, IGcam, IGL*

N9, T

35/42

605 (KGFR2)

  VCBP5

24571192

B. floridae

C

CBM_14, IGv4, IGL

N4, N9

42/38

57 (chitinase2)

  VDB

34101043

B. lanceolatum

C

IGv4

N4, N9, T, Vp

45/52

52 (TCRalpha)

Reference samples

       

  GCSAMS

5730150

G. cydonium

P

IG4*

IW, S, T, Vp

50/40

52 (HSPG2)

  TCRL

51827413

P. marinus

V

IGcam, IG4

S

44/47

61 (UNQ2770)

aNone of the molecules containing exclusively domains of antigen receptor (AR) relationship, i.e., COS2.1. GCSAMS, VDB, VCBP5, TCRL, and molluscan defense molecule (MDM; a possible mediator of non-self-recognition [Hoek et al. 1996]), exhibits extensive family-like similarities (score higher than 200 bits) to vertebrate sequences (maximal BLASTP or TBLASTN similarities are shown in the last column). For a possible explanation see Questions and Possibilities (Discussion). Molecules of chordate Deuterostomia origin did not pass the Ig/T limit of this table. FGFR, fibroblast growth factor receptor; A, Annelida; M, Mollusca; U, Urochordata; V, Vertebrata; C, Cephalochrodata; D, Demospongiae.

bThree conditions were required in our selection: (i) BLASTP similarity to any reference sequence (see below), (ii) conserved domain similarity to IG or IGv domains, and (iii) maximal score value of BLASTP similarities with Ig/T-cell receptor sequences of at least 40 bits (for details see the second section of Materials and Methods).

cOnly regionally dominant conserved domain similarities are shown here. These similarities are arranged in the order of their bit scores. Fz, Fz domain; KR, kringle domain; RTPh, receptor tyrosine phosphatase; sim, similar to. *Repeating dominant conserved domain similarities.

dReference sequences (RfS). IW—IgW from sandbar shark (for details see Additional Important Structural Relationships [Results]); N4—zebrafish NITR1l (NP_938161) related to smart00409; N9-rainbow trout NITR2 (AAL83815) related to cd00099; S—human SIRP-alpha1 (CAA71403); T—lamprey T-cell receptor-like molecule (AAU09668); Vp—lamprey Vpre-B-like protein (AAT90420). For details see Reference Sequences (RfS) Used in the Combined Search Related to Table 3 (Materials and Methods).

PSI-BLAST procedure further selecting IGv-related sequences

The protocol was in agreement with the recommended prealignment strategy (Simossis et al. 2005). In accordance with Table 2 and in coincidence with critical domain similarities of molecules displayed in Table 3 (except for COS2.1), the sequence of the domain smart00409 was determined as the best possible query sequence for both multistep PSI-BLAST iteration procedures. In addition, we required collocation of two types of sequence similarities with selected segments: (i) conserved IGv (domain) similarities in the case of molecules described in Table 2 and (ii) BLASTP similarities with IGv in other cases. To avoid structural redundancy, only GCSAMS (model molecule in Traces of Hypermutation Milieu in the Selected IGv-Related Sequences of a Marine Sponge [Results]) and RTK (like GCSAML and GCSAMS, RTK is also upregulated during allograft fusion [Blumbach et al. 1999; Schutze et al. 2001]) represented the sponge adhesion molecules displayed in Table 2 after the first PSI-BLAST iteration.

Restriction of conserved region of IGv-related segments derived by the procedure described in “PSI-BLAST Procedure Further Selecting IGv-Related Sequences,” above

Two different multiple sequence alignments representing two independent methods (i.e., maximum likelihood [Muscle 2.01] and neighbor joining [CLUSTAL W 1.82]) were employed. To better restrict short segments of high similarity in blocks generated by multiple sequence alignment, we look for the region of minimal presence of gaps, where common aa or aa of superior occurrence can also be found.

Important Sequences and Employed Templates

Sequences of conserved Ig domains as PSI-BLAST query sequences

The Ig domain regions containing all the segments exhibiting RPS BLAST-derived similarities with the sequences considered to be upload to PSI BLAST were determined. These regions were employed as query sequences instead of the whole domain sequences (simplified in the following text).

Reference sequences (RfS) used in the combined search related to Table 3

In accordance with literature data, both nonrepeated (unique) available sequences of TCRL and VpBLP protein sequences were included in our RfS set. IgW and NITR representatives were selected by PSI-BLAST searches. Two IgW segments of the highest scores of similarity to both different IGv (smart00406 and cd00099) were put in these searches as query sequences. Since a similar PSI-BLAST search was not successful in the case of the SIRP family, the SIRP molecule of the best position in all individual BLASTP searches with accessible IgW query sequences represented a given family as a RfS. Complete protein sequences determined by preceding segment similarities were used as RfS in our combined search (Table 3). For a more detailed overview related to our choice of RfS see WP2.3.

Formation of hybrid templates

Hybrid templates are derivatives of original MSA-derived templates (described in WP2.2) formed in a two-step process including reduction and projection. Only the columns with single species of non–randomly occurring aa, all usual or reevaluated high occurrence aa (but not mhoaa; Table 4; also WP3.2, WP3.3) and SO aa (described together with RA aa in Amino Acids (aa) of Substantially Different Column Occurrences, above) were directly integrated into hybrid template sequence (HTS) in the first step. In the second step, we compared RA aa, which cannot be classified as high occurrence aa, with a preselected RfS (the segment of well-correlated IgW is used below). In the cases of identities between RA aa and the corresponding aa of RfS, we kept only identical aa at given HTS positions. On the other hand, all (nondecided) RA aa at positions without such aa identity remained in HTS. To diminish false positivity in database searches with the resulting HTS (following from the inclusion of questionable aa described in WP3.3), we used here the PHI BLAST program (for details see below).
Table 4.

Conserved regions in the selected nonvertebrate IGv-related segmentsa,b

Clustal W (1.82) multiple sequence alignment

ror

T----AWGS------------RLKINDVRPSDSAVYTCKAENDFGNEETSGSLTVL- 86*177

apCAM

-------G-------------VLTINPLKTTDQATYTCIATNKGGFAESSNTLDV-- 80*300

Tractin

-----THGN-------------LLVSNLQLSDSGNYICFASNKFGNDSVGANLIV-- 81*516

leechCAM

-----EDG--------------LLIKNITTEDDGIYQCSAN--VEND---------- 54*200

RTK

TL---SNGSVSSS--EKVALSQLTIFNVTAADEGEYTCSVD---GESASF------- 92*129

GCSAMSd2

TL---SNGSVSSS--DKVALSQLTIFNVTVADEGEYTCSVD---GESASF------- 92*206

CG14162d2

-----EKGDV--------TTSFLLIQNADLADSGKYSCAPS---NADVASVRVHVLN 90*273

CG14469-PA

-----TPGPR--------TQSRLIIREPQVTDSGNYTCSAS---NTEPAS------- 82*140

CG6867-PA

-----PEGFK--------TTMRLTISNLRKDDFGYYHCVAR----NE---------- 69*612

MDM

-----SDGR------------ALTIRSVTGSDQKKYYCSASNSAGFAGPHAVFLNV- 80*229

IgW

APG--IEGRFTPS--VVSNTAYLEITSLSVTDTAIYYCA------------------ 87*112

VCBP5

-----GSGSFT---------PTLTITDIRPSDSGRYWCAPDISEDYSNLG------- 60*281

VDB

AG---YQGRVTFIGDLSTGVANIRLSNMQTEDSGSYTCSVTVFGDGQDSQSITVTV- 103*140

GCSAMSd1

-----STNTHS---------SSLVISGLRYSDAGDYMCTVE---------------- 77*82

NITR2

CEKSPEAGSPTQS-----CVYNLPKRNLTLSDAGTYYCAVASCGEILFGNRTKLDV- 99*262

Common aa (., : ,*)

. : * * *

hoaa

L-I-N----D-G-YTC-A--

CRCL(*)

******************

IgW/CRCL

LEITSLSVTDTAIYYCAR

Muscle (2.01) multiple sequence alignment

ror

TAWGS---RLK-------------INDVRPSDSAVYTCKAE--NDFGNEETSGSLTVL 86*177

CG14162d2

EKGDVTTSFLL-------------IQNADLADSGKYSCAPSNADVASVRVHVLN---- 90*273

CG6867-PA

PEGFKTTMRLT-------------ISNLRKDDFGYYHCVARNE--------------- 69*612

Tractin

------HGNLL-------------VSNLQLSDSGNYICFASNKFGNDSVGANLIV--- 81*516

LeechCAM

----KVEDGLL-------------IKNITTEDDGIYQCSANVEND------------- 54*200

apCAM

---------LT-------------INPLKTTDQATYTCIATNKGGFAESSNTLDV--- 80*300

CG14469-PA

TPGPRTQSRLI-------------IREPQVTDSGNYTCSASNTEPAS----------- 82*140

MDM

-YTLSSDGRAL------------TIRSVTGSDQKKYYCSASNSAGFAGPHAVFLNV-- 80*229

RTK

NGSVSSSEKVA--------LSQLTIFNVTAADEGEYTCSVDGESASF----------- 92*129

GCSAMSd1

STNTHSSSLV--------------ISGLRYSDAGDYMCTVE----------------- 77*82

GCSAMSd2

NGSVSSSDKVA--------LSQLTIFNVTVADEGEYTCSVDGESASF----------- 92*206

VDB

IPSAGYQGRVTFIGDLSTGVANIRLSNMQTEDSGSYTCSVTVFGDGQDSQSITVTV-- 103*140

IgW

EFAPGIEGRFT--PSVVSNTAYLEITSLSVTDTAIYYCA------------------- 87*112

NITR2

EKSPEAGSPTQ------SCVYNLPKRNLTLSDAGTYYCAVASCGEILFGNRTKLDV-- 99*262

VCBP5

DGSGSFTPTLT-------------ITDIRPSDSGRYWCAPDISEDYSNLG-------- 60*281

Common aa (*)

* * *

hoaa

--I-N----D-G-YTC-A-N

rhoaa

L-I-N----D-G-YTC-A-N

CR (*)

****************

HTS1

LTISNLBVSDSGXYTCSAZN

IgW/PBB

ITSLSVTDTAIYYCAR

PBB/IgW

FSSLTGYDLEWTYCAR

PBB/tmaa

FSSLTGYDLEWTYCAR

aDisplayed C-terminal parts of selected PSI-BLAST-derived segments contain the conserved regions (CR; underlined) of high similarity, without gaps present in both multiple sequence alignments. These regions possibly correspond to the primordial building block of variable immunoglobulin heavy chains described by Ohno et al. (1982). For details see Formation of Hybrid Templates (Materials and Methods), Conserved Regions Within the Segments Related to Variable Ig Domains (Results), WP2.2 and WP5.1, and chapter WP3.

bThe first and second numbers after the sequences denote C-terminal positions of displayed peptide chains in PSI-BLAST derived segments and whole molecules, respectively. hoaa (see below) of both alignments restrict the regions with compared aa. CRCL—Clustal W derived conserved regions; hoaa—high occurrence aa limited by a length equivalent value of three; IgW/PBB, IgW/CRCL—completed IgW segments similar to PBB (C-terminal arginin was not present in our PSI-BLAST searches) or of CRCL extent, respectively; HTS1—hybrid template sequence 1 constructed here according to Formation of Hybrid Templates (Materials and Methods) (B = Q,R,T and Z = D,S); PBB—OhnoÇ’s modern intact primordial building block; PBB/tmaa—PBB identities with aa of template motif (tmaa) level (for details see WP3.2 and WP3.3); rhoaa-hoaa reevaluated in accordance with the results of PHI BLAST search (for details see Results of Template-Derived PHI BLAST Searches [Results]).

Results

Ig-like, IG, and IGv Domain Similarities: On-line Screening of Ig Domains from Unicellular Organisms to Nonvertebrate Craniata

Possible occurrence of Ig-like segments in Archaea proteins

Conserved similarities to typical bacterial Ig-like domains Big_1, BID_1, can be found in predicted protein sequences of Archaea origin (Table 1). Archaeal BID_1/Big_1 similarities in molecules displayed in Table 1 are accompanied by conserved similarities with polycystic kidney disease domains (PKD), which are very frequent in archaeal surface proteins (also in agreement with three-dimensional [3D] studies of archaeal surface layer proteins [Jing et al. 2002]). Interestingly, weak cross-similarities between segments similar to bacterial Ig-like domains and given PKD can be observed when comparing MM1983 (Table 1; Deppenmeier et al. 2002) with the extensive set of intimin (Table 1; see also the following paragraph) sequences on BLASTP.

Ig-like and Ig domains in the sequences of proteins from additional unicellular organisms

The majority of typical bacterial Ig-like domains described in Table 1 are cell surface proteins, which mediate specific interactions. Two of these molecules were extensively investigated. Different alleles and isoforms of BID_1/Big_1- and BID_2/Big_2-related proteins of the intimin family from E. coli interact with the enterocyte cytoplasmic membrane of various mammals. Some isoforms or alleles of the intimin family participate in pathogenetic processes caused by virulent E. coli strains such as enteritis and hemorrhagic enterocolitis (China et al. 1999; Zhang et al. 2002). Invasins having the same Ig-like domain similarity and coming from Yersinia pathogens are required for efficient invasive translocation of corresponding bacterial cells through intestinal epithelium to Peyer’s patches (Dersch and Isberg 2000). In addition to typical bacterial Ig-like domains, conserved metazoan ones, e.g., Ig, IgC2, IGcam, and IgL, were also detected in the sequences of bacterial proteins (last part of Table 1) but not in our screening of protozoan and yeast proteins. Despite this result, other different structural relationships to Ig domains or Ig superfamily were described in several papers dealing with unicellular Eukaryota (Wojciechowicz et al. 1993; Chiang et al. 2001, 2002; Sheppard et al. 2004).

IGv-related molecules in metazoans

The results of our search and comparison of nonvertebrate chimerical IGv-related conserved Ig domain similarities (CICIDS) are shown in Table 2. Selected fruit fly protein GC2198 and molecule VCBP5 of Branchiostoma floridae origin are secreted proteins (Cannon et al. 2002; Vogel et al. 2003), and isoforms of turtle proteins are secreted or transmembrane proteins, whereas the membrane relationships of the anopheles molecule and fruit fly protein CG6867 are as yet unknown (Table 2). All the other molecules listed in Table 2 are (or are predicted to be) cell surface proteins (Blumbach et al. 1999; Teichmann and Chothia 2000; Cannon et al. 2002; Sato et al. 2003; Vogel et al. 2003). In contrast to indicated CICIDS and to the close phylogenetic relationship to Igs and T-cell receptors (Sato et al. 2003), five hydrophobic (potentially transmembrane) segments were detected in the sequence of VDB (last item in Table 2).

The bit score of similarities between IGv smart00406 and VDB (53.1bits) or the minimum bit score (54.6 bits) related to NCBI protein bank accessible IgW (molecules close to primordial Igs [Bernstein et al. 1996b]) were distinct from other bit score values (interval, 33.4–43.8 bits) of displayed CICIDS. The two-tailed test determines significant differences between the highest IGv (smart00406)-related score of VDB and the scores of other IGv-related similarities (p < 0.001) or the other smart00406 ones (p < 0.01) displayed in Table 2.

CICIDS of improved linkage to IGv were selected based on the reevaluation of RPS BLAST results, and using PSI-BLAST iteration on the set of molecules described in Table 2. Only RPS BLAST derived IGv similarities of bit score limited by 40 bits or double-domain IGv similarities were passed to the sequence subset of improved RPS BLAST relationship. On the other hand, successful overlapping between CICIDS and the segments selected in PSI-BLAST procedures (generated by corresponding IGv sequence query) was required in our latter procedure (selected CICIDS are denoted by asterisks in Table 2). The final double-selected subset of CICIDS then contained only 9 of 26 original CICIDS in Table 2, i.e., CICIDS of CG14162d2, VDB, 1 representative isoform of turtle proteins, and 6 CICIDS of four sponge adhesion molecules of Geodia cydonium origin displayed in Table 2. In addition to this double-selection result, GCSAMSd1, GCSAMLd1, and a proposed single Ig domain of VDB were passed through the bit score limit, exhibited double domain similarity, and were selected in both IGv-related PSI-BLAST procedures.

IGv (domain) smart00406 similarities were more frequent in Table 2 (18 items) and in their PSI-BLAST fraction (11 items) than IGv cd00099 (13 and 6 items, respectively). In accordance with this result, smart00406 is a pluripotent conserved domain of phylogenetic linkage to seven different Ig domains, whereas unipotent (cd00096-linked) IGv cd00099 resembles a terminally diversified domain (for details see RPS BLAST web pages). Nevertheless (and perhaps in agreement with the proposed terminal status), cd00099 exhibits a higher score of similarities to the model IgW sequence set related to primordial Igs (for details see Reference Sequences [RfS] Used in the Combined Search Related to Table 3 [Materials and Methods]). Unfortunately, almost all IGv similarities presented in Table 2 are not regionally dominant. Most frequently, the Ig domain more related to constant chains [IG] smart00409 displays dominant Ig domain similarity to IGv-related segments, which may imply a closer relationship of this domain to ancestor structures of selected CICIDS and both IGv.

Traces of Hypermutation Milieu in the Selected IGv-Related Sequences of a Marine Sponge

Occurrence of hypermutation-related tetranucleotides in CDR1-like segments of GCSAMS

Our original (Kubrycht et al. 2004) and more recent searches concerned hypermutation-related tetranucleotides (i.e., hypermutation tetranucleotides [HT] and their single mutants [SM]) derived from both recent and former hypermutation motifs, i.e., RGYW/WRCY ([Rogozin and Kolchanov 1992; Dorner et al. 1998a] R = A,G) and DGYW/WRCH ([Rogozin and Diaz 2004] D = A,G,T; H = A,C,T; W = A,T; Y = C,T), respectively. These searches revealed the presence of some of these structures in GCSAMS(cdr1.L1), a longer CDR1-like segment of the first Ig domain of GCSAMS (for structural details see WP2.1; Kubrycht et al. 2004), whereas only a single occurrence of the cytosine- (and also AID)-unrelated HT TATT (Dorner et al. 1998b) was found.

Because of the low power of Fisher’s test under our limiting conditions (in accordance with Lepš 1996), we used a combined two-parameter approach in our statistical evaluation of HT and their SM occurrences. We required reliability in the chi-square test with the correction for continuum, which is usually recommended to be used in the case of low mean values (Lepš 1996), and sufficient sample size in accordance with the binomial approach to short motifs (Kubrycht et al. 2004). Our approach revealed a significantly (p < 0.01) increased number of GGCA HT and SM in GCSAMS(cdr1L.1) relative to their occurrence in GCSAMS(cdr1L.2) or the expected random value. Despite the problematic sample size, the occurrence of the complementary SM pair AGTA/TACT is markedly increased. Hence both AGTA and TACT SM are present four times in two antiparallel pentadecanucleotide segments of GCSAMS(cdr1.1.com) but are absent in whole GCSAMS(cdr1L.2) as well as in the longer complementary part of GCSAMS(cdr1L1). AGTA/TACT is also a unique pair which contains two important trinucleotide structures: the phylogenetically important motif AGY involved in hypermutation of Ig and nurse shark antigen receptors (Diaz and Flajnik 1998) and a unique cytosine-containing AGY-unrelated trinucleotide pair TAC/GTA correlated with Ig hypermutation (Dorner et al. 1998a).

The segment almost identical to CDR1-like segment GCSAMS(cdr1.1.com) exhibits the least similarity to mammalian mRNA sequences within the preformed GF region

Since the regions investigated here are related to CDR1, we assumed that the occurrence of selected oligonucleotide mRNA dissimilarities (ORD) within GF should be first of all related to the mutation instability (see also WP2.1). Consequently, we observed the distribution of sites without sequence identities to human and mouse mRNA oligonucleotides of inferior possible lengths (IROIL) within GF. The resulting profiles are displayed in Fig. 1. ORD occur predominantly in the DIF1 segment (GP: N166–192). This local occurrence of ORD was significantly more frequent than that in the complementary part of GF (p < 0.01; chi-square evaluation for 2 × 2 tables [Lepš 1996]) or its envelope regions (ER) also mentioned in Fig. 1. Significant differences (p < 0.01) were also determined in the case of a similar evaluation of both ER of GCSAMS(cdr1.1.com) located at positions closely related to DIF1 (GP: N167–195 and N168–194). In addition, the important difference between DIF1 and the model neighbor segment of equal length, DIF2 (GP: N139–165), was also found in further comparative BLASTN searches, within the GF region. The DIF2-related odds ratios 2.6 and 4.9 (p < 0.01) suggest respective predominant dissimilarities of mouse and human mRNA sequences (encoding molecules of different names) with DIF1. The difference was related to the frequencies of the corresponding oligonucleotide identities (OI; length of 13 nucleotides and higher) in the nonvertebrate metazoan sequence set, where only a 1.2 times higher frequency of OI than in DIF1 was found in DIF2.

Knowledge-Based Approach in the Subset of Coelomate Proteins Complementary to Preceding Ig Domain Screening

Similarities between selected and reference sequences

Six and four of ten selected molecules exhibited simultaneous BLASTP similarities with two or three different RfS, respectively (Table 3). Four and five such similarities were found in the cases of VDB and tractin, respectively. These frequent double and multiple BLASTP similarities suggest common structural features in the sequences of the presented molecules and RfS. TCRL represented the most frequently similar RfS in our BLASTP searches. Eight of ten selected molecules were similar to TCRL (for details see Table 3). These similarities differently overlapped with short IGv- and full length IGcam-related segments of TCRL located at aa positions 84–119 (TIGV) and 158–228 (TIGCAM), respectively. VDB similarity was a unique one which overlaps with the whole TIGV. The segments of leechCAM, RTPh, and apCAM partially overlapped with TIGV and fully overlapped with TIGCAM. Partial overlaps of both TIGV and TIGCAM were seen in their similarities to COS2.1 and tractin, whereas a single partial overlap of TIGV was found in the case of ror similarity. FGF receptor and the other tractin segments completely overlapped with TIGCAM.

Additional important structural relationships

Both RfS-related PSI-BLAST searches within the IgW sequence set selected the same IgW heavy chain sequence from the sandbar shark of clonal name AAB03680, which suggests an improved model importance of this RfS. This relationship appears to be important also from the point of view of construction of the hybrid template used in the following section. All the molecules without family-like similarities to vertebrate sequences displayed in Table 3 contained exclusively Ig domains (for a possible explanation see Questions and Possibilities [Discussion]).

Conserved Regions Within the Segments Related to Variable Ig Domains

Multiple sequence alignments of IGv-related molecules preselected by the double PSI-BLAST-derived procedure

Fifteen segments were selected by the procedure described in PSI-BLAST Procedure Further Selecting IGv-Related Sequences (Materials and Methods). Subsequent multiple sequence alignments performed by CLUSTAL W 1.82 and MUSCLE 2.01 (for complete record of alignments see WP3.1) enabled us to locate common aa and generate a hybrid template sequence (HTS1 in Table 4). Primary derived conserved regions (CR) were found in the C-terminal parts of selected IGv-related segments in accordance with the third section of Materials and Methods. Their positions also corresponded to the C-terminal position(s) of the compared Ig domain(s). CR contained three different common aa and did not contain any gaps.

Relationship between CR and Ohno’s primordial building block

High-density (more than 75% identity) sequence similarities between the CR segment of model (reference sequence) IgW (CRIgW; Table 4) were found at three positions of accessible IgW sequences, i.e., 65–79/68–82, 80–94/81–95, and 98–112/99–113 (original CRIgW position was 98–112). This result was further confirmed by comparisons between Igs and symmetrically extended 25-aa- and 35-aa-long CR-containing segments of model IgW. Corresponding data permitted us to perform more reliable searches, which indicated prevailing occurrences of similarities at the second and the third alternative positions. Interestingly, the second alternative position of given CRIgW similarities in IgW corresponded simultaneously to the position (aa 83–98) of the peptide chain encoded by Ohno’s 48-base-long “modern intact primordial building block” (PBB) derived from sequences of Ig variable region genes (Ohno et al. 1982) and to the position (aa 81–95) in IgW chains corresponding to dominant BLASTP and BLASTX similarities related to PBB. For additional consistent relationships between PBB and CR sequences see Table 4. In conclusion, CRIgW (Table 4) appears to be a PBB homologue, possibly closer to the conserved sequence of AR ancestor. This possibility is also in agreement with the results of the following phylogram study.

Approach to phylogenetic analysis of variable and IGv-related segments

In accordance with our phylogram-based frequency analysis (chapter WP4), only conserved and PSI-BLAST derived segments (PBDS) of apCAM and MDM origin exhibited closer overall linkage to IgW than NITR2 (a molecule with a published very close phylogenetic relationship to AR [van den Berg et al. 2004]) in our 40 phylograms. Despite this, the subset of phylograms constructed based on CLUSTAL W-derived multiple sequence alignments with PBDS indicated the closest phylogram linkage between IgW and CG6867 segments, whereas only several (less frequent) close IgW phylogram linkages to apCAM, but not to MDM, were observed (WP4.3). In addition, at least comparable phylogram IgW relationships with that between the IgW and the VDB (a molecule selected three times in IGv-Related Molecules in Metazoans, above) segments were found in the cases of the ror, GCSAMSd1, VCBP5, and CG14469-PA segments (WPT1 in WP4.3).

Results of template-derived PHI BLAST searches

Hybrid template sequence HST1 as a sequence query and two patterns, i.e., P1, including common aa Dx(3)YxC, and “single-tripled mutation-related” P2 (Dx(3)YxCx[AV]) (Table 4; see also WP3.2), enabled us to search for possible Ig-related conserved segments of nonvertebrate Metazoa origin (Formation of Hybrid Templates [Materials and Methods]). The searches with P1 resulted in 103 different segments. Except for 15 selected segments, all others contained an N-terminal leucine in addition to the selection, the pattern thus forming the common structure Lx(8)Dx(3)YxC. Similarly, 33 of these segments contained N-terminal LTI. These facts led us to reevaluate the N-terminal aa of the HTS1 in accordance with the CLUSTAL W alignment and also to establish reevaluated high occurrence aa (rhoaa; Table 4).

At least 7 of the top 10 layers (T10L; i.e., working item subset in which Expect or bit score values are higher than the values of the eleventh item) determined by 12 PHI BLAST searches contained specific segments (SPSE) of tractin and CG6867 displayed in Table 4 and also hemolin segments of Lymantria dispar (gypsy moth) origin (insect hemolins are proteins induced by bacterial infection and interact with lipopolysaccharide [Lindstrom-Dinnetz et al. 1995; Yu and Kanost 2002]). On the other hand, SPSE of MDM and GCSAMSd2 were present only two times among T10L. SPSE of other molecules such as RTK and VCBP5 occurred only in items outside the T10L subset. A maximum number of identities with rhoaa (9 of the possible 10) was found in SPSE of CG6867. Four molecules selected here (CG6867, GCSAMS, MDM, VCBP5) are also mentioned in the immediately preceding section.

The search with P1 in the set of all available bacterial proteins revealed only ZP_00561298 from Disulfitobacterium hafniense, a molecule displayed in Table 1. The segment containing a segment of ZP_00561298 (S1) is predominantly related to smart00408 (IGc2). S1 and two segments of the molecule COG3210 of Cytophaga hutchinsonii origin (see also Table 1) were present in the results of P2-related searches. One of the selected COG3210 segments (S2; aa positions 1625–1640) was located in the region exhibiting the highest conserved domain similarity found in the given molecule. This similarity concerned smart00409, which represents an important IGv-related domain (for details see IGv-Related Molecules in Metazoans, above). Both S1 and S2 contained seven rhoaa like the PHI BLAST-derived segment of MDM mentioned above.

Discussion

Selected Molecules and Structures Related to Antigen Receptors

Instead of dominant high-score IGv similarities to N-terminal Ig domains of vertebrate AR and AR-related proteins, the sequence of IG (domain) smart00409 forms prevailing superior conserved domain similarities to nonvertebrate IGv-related protein segments (Table 2; see also IGv-Related Molecules in Metazoans [Results]). The corresponding possible relationship of smart00409 to IGv ancestor structure appears to be interesting, first of all, due to the similar general role of the IGc1 domain deduced from 3D studies of IGv (Du Pasquier et al. 2004). In addition, this possibility would also be in agreement with comparable results of parallel PSI-BLAST procedures with both IGv and single smart00409 displayed in Tables 2 and 4, respectively (provided that we disregard the consequences of antiredundant selection of sponge adhesion molecules before formation of the final set displayed in Table 4). The benefit of presented domain relationships is perhaps also in determined common aa and conserved regions, which could be interesting for future 3D reevaluations of IG and IGv architectures and interactivities. In addition, the C-terminal position of the conserved regions remarkably collocates with the AR segments undergoing recombination (for sequence comparisons see Relationship Between CR and Ohno’s Primordial Building Block [Results]).

In contrast to the highly sophisticated artificial sequence of smart00409, sequences of sponge molecules GCSAMS and GCSAML are actual AR-related sequences. Similarly to AR these highly homologous proteins participate in allograft reactions (Blumbach et al. 1999; Schutze et al. 2001) and include segments of conserved domain similarities to IGv (IGv-Related Molecules in Metazoans [Results]). GCSAMS as a representative of both these molecules is phylogenetically related to AR (see Approach to Phylogenetic Analysis of Variable and IGv-Related Segments and, also, Results of Template-Derived PHI BLAST Searches [Results]; Table 4) and its N-terminal segment also exhibits several structural relationships to CDR1 (see Traces of Hypermutation Milieu in the Selected IGv-Related Sequences of a Marine Sponge [Results], Fig. 1, and Kubrycht et al. [2004]). In addition, the source of GCSAMS and GCSAML, the marine sponge Geodia cydonium (as well as other Demospongidae species), is possibly related to the earliest common metazoan ancestor (Urmetazoa [Muller 1998; Muller et al. 2001; Wiens et al. 2003]). Moreover, evidence was presented allowing the conclusion that marine sponge (Demospongidae) proteins are more closely related to the corresponding molecules from H. sapiens than to those of C. elegans and D. melanogaster (Muller et al. 2001). Consequently, GCSAMS and GCSAML represent possible candidates for common ancient phylogenetic origin with AR. A similar relationship to AR also concerns chordate VDB, a nonvertebrate molecule of maximal RPS BLAST similarity to IGv described here. This molecule was also selected by all our PSI-BLAST searches and passed through all criteria described in the first section of Results and in Table 3. In accordance with published results of BLASTX comparisons (Sato et al. 2003) and also with our BLASTP searches, VDB is more similar to T-cell receptors than to Igs, whereas GCSAMS, mentioned above, is more similar to Igs (Table 3). Despite the problematic phylogenetic relationship of corresponding species, several procedures used in Conserved Regions Within the Segments Related to Variable Ig Domains (Results) and WP4 suggest an interesting structural similarity between AR and the molluscan defense molecule (MDM). Like AR, MDM is considered to be a possible mediator of non-self recognition (Hoek et al. 1996). In contrast to the neuronal origin of all other protostomial molecules described in Table 3, this molecule is a unique one specifically expressed in granular cells located in connective tissue of mesoderm origin. In addition to preceding marked functional and structural similarity between AR and MDM, a question arises with respect to more general phylogenetic relationships among vertebrate AR, MDM, and some protostomial molecules involved in adhesion of neural cells. Hence apCAM and ror were selected in Approach to Phylogenetic Analysis of Variable and IGv-Related Segments (Results) and some additional protostomial molecules can be found in Table 4. Though the structurally based selection appears to be sufficiently exact, a broader reevaluation of protein interactivities has to be implemented in the future. We assume that particularly interactions with random peptides (Smith 1985; Rossenu et al. 1997) and recent prediction (Huang et al. 2004; Kim et al. 2004; Park et al. 2005) and experimental tools (Ito et al. 2001) of interactome analysis will be important to reevaluate recent sequence data, including also the sets of molecules described here.

Ig Domains of Unicellular Eukaryota

A recent study has revealed the existence of human molecules (putative cytoskeletal organizing protein TRIOBP [Riazuddin et al. 2006]) exhibiting conserved domain similarity to Ig/Ig superfamily-related domain Candida_ALS (pfam05792), frequently found in yeast molecules including α-agglutinin of Saccharomyces cerevisiae (Wojciechowicz et al. 1993; Sheppard et al. 2004). On the other hand, an absence of conserved domain similarities between proteins from unicellular Eukaryota and AR-related Ig domains (all Ig domains present in Tables 2 and 3 and two domains, IGc and IgGc1, astonishingly absent from the tables) is still observed. This follows not only from the database searches described under Materials and Methods, but also from the two-step (BLASTP and RPS BLAST) database screening using all not yet compared AR-related Ig domains as starting query sequences (data not shown). Only some special homologies of proteins of unicellular Eukaryota origin without unifying conserved domain similarity have been described, e.g., ICAM-L homologues were found in Leishmania (protozoan) genus (Chiang et al. 2001, 2002). Such a result need not represent a necessary contradiction. Hence new, more general (RPS-BLAST) and similar domains can appear in the future (similarly to the reevaluation of cytidine deaminase domains in the past).

The absence of conserved domain similarities between AR-related Ig domains and proteins of unicellular Eukaryota origin also contrasts with such similarities of bacterial proteins (Table 1). This unexpected difference may follow from more frequent symbiotic or parasitic interactions of bacteria with metazoans. Hence such interactions may potentiate topical convergent competitive changes of physical cell–cell interactions (including “reconstituted” or “imitating” interactions of metazoan proteins via similar segments of metazoan and bacterial Ig domains improving critical Ig domain similarities), or horizontal gene transfer of metazoan genes encoding proteins with critical conserved Ig domain similarities to bacteria (Ochman et al. 2000; Ray and Nielsen 2005), or even inverse transfer from bacteria to metazoans similarly to the postulated transfer of RAG1 genes during the “big bang” period of Ig germline gene reshuffling and rearrangement (Bernstein et al. 1996a; Marchalonis and Schluter 1998).

Questions and Possibilities

The relationship between hypermutating DNA regions (HDR) in somatic cells, i.e., the regions which also include hypermutating Ig exons (see also Introduction), and the more widely spread DNA hot-spot regions is as yet unknown. Nevertheless, some facts suggest a possible phylogenetic linkage between these nonstable DNA regions. Two types of somatic mutation structures (AGC/GCT involved also in a more specific Ig hypermutation and WAN) and GC dinucleotide related to meiotic mutation were found to be less frequently involved also in counterpart processes, i.e., in human meiotic and somatic mutations, respectively (Oprea et al. 2001). This means that both types of mutations, as well as hot spots and HDR, are still not completely separated in vertebrates. Interestingly, this fact and the assumed resistance of Ig domains composing AR-related molecules to hypermutation (and possibly also insertion/deletion changes) enable us to explain the absence of non-Ig domain similarities in the sequences of AR-related molecules less similar to vertebrate proteins described in Additional Important Structural Relationships (Results) and Table 3. Hence the gradual loss of function and elimination of the domain exons different from that encoding Ig ones can be expected in genes of more diversified sequences (less similar to recent vertebrates) included in hot spots or primitive HDR. The assumed resistance of AR-related Ig domains seems to be in accordance with the ability of antibodies to form alternative interactions (James et al. 2003) and a broad-range interactivity of molecules exhibiting IG and IGv conserved domain similarities (in addition, the existence of hypermutating AR genes). In the end, this consideration poses the question whether the genes of lower similarity to vertebrate molecules mentioned in Table 3 indeed hypermutate.

In our previous paper we hypothesized that DNA encoding some substrate, inhibitory or regulatory regions of protein kinases (PK), or a more general pattern or pattern-related DNA structure had a role in the formation of the ancestor CDR1 structure (Kubrycht et al. 2004; slightly updated). Interestingly, peptide PK substrates and inhibitors (PKSI) form an extensive set of sometimes very similar but distinctly interacting structures like hypervariable regions of antibodies CDR1 and CDR2 (Kubrycht et al. 2002). In addition, we can find the segments similar to PKSI in almost the same positions of N-terminal segments of Igs and GCSAMS (Kubrycht et al. 2002, 2004). These segments (PKSI-related regions) even overlap the N-terminus of CDR1 and the CDR1-like segment of GCSAMS (see Occurrence of Hypermutation-Related Tetranucleotides in CDR1-like Segments of GCSAMS [Results] and WP2.1). Besides the possible role of recombinant events, an interesting alternative based on convergent or combined convergent/recombinant events can also be considered, when trying to explain the given similarities (Kubrycht et al. 2004). This interesting alternative would follow from the recent concept of very ancient (more than 1000–500 million years ago) “innate immunity” including undiversified immune/clearance processes via ancient immune/preimmune molecules (Marchalonis et al. 2002). Hence the cross-similarity of PKSI-related regions mentioned above accentuates the question of their ancient inhibitory/binding cross-reactivity with the active centers of different PK (e.g., PK released by parasite organisms or dying cells). This interaction would lead to the proposed ancient clearance process, which diminishes possible disregulating effects of PK via their binding to the PKSI-related region of cell surface PPCAR (more precisely early PPCAR) and subsequent PK destruction in lysosomes. In this case, overlapping (perhaps weakly hypermutating) CDR1/DIF1-like segments (see Occurrence of Hypermutation-Related Tetranucleotides in CDR1-like Segments of GCSAMS [Results] and The Segment Almost Identical to CDR1-like Segment GCSAMS[cdr1.1.com] Exhibits the Least Similarity to Mammalian mRNA Sequences Within the Preformed GF Region [Results]) of PPCAR would then complete the necessary context of interaction (via additional recognition or sterical hindrance), restricting and/or spreading the repertoire of recognized PK.

Since the hybrid template in Results of Template-Derived PHI BLAST Searches (Results) selects bacterial protein COG3210 belonging to a group of large exoproteins involved in heme utilization and adhesion, the question of the role of diversified heme structures in the early evolution of variable Ig domains arises. In spite of it, more detailed structural analysis will be necessary to assume such possibility even in the case of the early PPCAR variant(s).

Methodological Aspects

Two different procedures were demonstrated in our local search for segments related to Ig hypermutation (see Traces of Hypermutation Milieu in the Selected IGv-Related Sequences of a Marine Sponge [Results] and Kubrycht et al. [2004]). These procedures represent only the initial stage of more extensive future mapping of such segments. In accordance with this trend, the first program predicting local occurrence of somatic mutations based on secondary structure of DNA was recently described (Wright et al. 2004). In addition to the most frequently used hypermutation tetranucleotides (Rogozin and Diaz 2004), a broader repertoire of structures was proved in their linkage to Ig hypermutation (Dorner et al. 1997, 1998a, b; Diaz and Flajnik 1998; Diaz et al. 1999; Oprea et al. 2001; Shapiro et al. 2003; Boursier et al. 2004; Duquette et al. 2005) and could be useful in future studies. Since the mechanism of Ig hypermutation was fully completed only in vertebrates, model organisms closer to the vertebrate lineage will also be necessary for further sophisticated phylogenetic research. The first example of such animal model is possibly G. cydonium (Muller 1998; Muller et al. 2001), whose molecules were successfully correlated in this paper. The recently described monitoring of hypermutation using retroviral vectors with fluorescence proteins of different color (Klasen et al. 2005) also represents a possibility for similar investigation of potentially competent or on-line predicted nonvertebrate cells (e.g., cells expressing the molecules mentioned in the first or second paragraphs of Questions and Possibilities or Selected Molecules and Structures Related to Antigen Receptors, respectively, above).

Footnotes

  1. 1.

    Abbreviations related to molecules may denote both protein and nucleotide sequences. Standardized BLAST searches and procedures are unambiguously denoted by usual abbreviations. CDR1—the first hypervariable region of Igs; GP—terminal GCSAMS positions of the observed peptide or oligonucleotide segments are marked with “aa” or “N,” respectively; GCSAM—sponge adhesion molecules of Geodia cydonium origin; GCSAMS and GCSAML—original abbreviations of cell recognition molecules from the sponge G. cydonium, also denoted GSAMS and GSAML, respectively; PPCAR—phylogenetic precursor(s) of hypothetical common ancestor of antigen receptors encoded possibly by rearranging gene; WP1.1 to WP5.5—sections of the web page http://www.papersatellitesjk.com, including also more detailed lists of abbreviations.

References

  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  2. Azumi K, De Santis R, De Tomaso A, Rigoutsos I, Yoshizaki F, Pinto MR, Marino R, Shida K, Ikeda M, Ikeda M, Arai M, Inoue Y, Shimizu T, Satoh N, Rokhsar DS, Du Pasquier L, Kasahara M, Satake M, Nonaka M (2003) Genomic analysis of immunity in a Urochordate and the emergence of the vertebrate immune system: “Waiting for Godot.” Immunogenetics 55:570–581PubMedCrossRefGoogle Scholar
  3. Banerjee M, Mehr R, Belelovsky A, Spencer J, Dunn-Walters DK (2002) Age- and tissue-specific differences in human germinal center B cell selection revealed by analysis of IgVH gene hypermutation and lineage trees. Eur J Immunol 32:1947–1957PubMedCrossRefGoogle Scholar
  4. Beale RCL, Petersen-Mahrt SK, Watt IN, Harris RS, Rada C, Neuberger MS (2004) Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol 337:585–596PubMedCrossRefGoogle Scholar
  5. Bernstein RM, Schluter SF, Bernstein H, Marchalonis JJ (1996a) Primordial emergence of the recombination activating gene 1 (RAG1): sequence of the complete shark gene indicates homology to microbial integrases. Proc Natl Acad Sci USA 93:9454–9459CrossRefGoogle Scholar
  6. Bernstein RM, Schluter SF, Shen S, Marchalonis JJ (1996b) A new high molecular weight immunoglobulin class from the carcharhine shark: implications for the properties of the primordial immunoglobulin. Proc Natl Acad Sci USA 93:3289–3293CrossRefGoogle Scholar
  7. Bjedov I, Lecointre G, Tenaillon O, Vaury C, Radman M, Taddei F, Denamur E, Matic I (2003) Polymorphism of genes encoding SOS polymerases in natural populations of Escherichia coli. DNA Repair 2:417–426PubMedCrossRefGoogle Scholar
  8. Blumbach B, Diehl-Seifert B, Seack J, Steffen R, Muller IM, Muller WEG (1999) Cloning and expression of new receptors belonging to the immunoglobulin superfamily from the marine sponge Geodia cydonium. Immunogenetics 49:751–763PubMedCrossRefGoogle Scholar
  9. Boursier L, Su W, Spencer J (2004) Analysis of strand biased ‘G’.C hypermutation in human immunoglobulin V(lambda) gene segments suggests that both DNA strands are targets for deamination by activation-induced cytidine deaminase. Mol Immunol 40:1273–1278PubMedCrossRefGoogle Scholar
  10. Bradshaw PS, Condie A, Matutes E, Catovsky D, Yuille MR (2002) Breakpoints in the ataxia telangiectasia gene arise at the RGYW somatic hypermutation motif. Gene 21:483–487Google Scholar
  11. Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res 14:693–699PubMedCrossRefGoogle Scholar
  12. Brotcorne-Lannoye A, Maenhaut-Michel G (1986) Role of RecA protein in untargeted UV mutagenesis of bacteriophage lambda: evidence for the requirement for the dinB gene. Proc Natl Acad Sci USA 83:3904–3908PubMedCrossRefGoogle Scholar
  13. Cannon JP, Haire RN, Litman GW (2002) Identification of diversified genes that contain immunoglobulin-like variable regions in a protochordate. Nat Immunol 3:1200–1207PubMedCrossRefGoogle Scholar
  14. Cannon JP, Haire RN, Pancer Z, Mueller MG, Skapura D, Cooper MD, Litman GW (2005) Variable domains and a VpreB-like molecule are present in jawless vertebrate. Immunogenetics 56:924–929PubMedCrossRefGoogle Scholar
  15. Chiang SC, Ali V, Huang AL, Chu KY, Lee ST (2001) Molecular, cellular and functional characterization of a novel ICAM-like molecule of the immunoglobulin superfamily from Leishmania mexicana amazonensis. Mol Biochem Parasitol 112:263–275PubMedCrossRefGoogle Scholar
  16. Chiang SC, Chang SC, Lee ST (2002) ICAM-L gene is conserved only in Leishmania species in the family of kinetoplastida. Mol Biochem Parasitol 124:47–50PubMedCrossRefGoogle Scholar
  17. China B, Jacquemin E, Devrin A-C, Pirson V, Mainil J (1999) Heterogeneity of the eae genes in attaching/effacing Escherichia coli from cattle: comparison with human strains. Res Microbiol 150:323–332PubMedCrossRefGoogle Scholar
  18. Deppenmeier U, Johann A, Hartsch T, Merkl R, Schmitz RA, Martinez-Arias R, Henne A, Weizer A, Baumer S, Jakobi C, Bruggemann H, Lienard T, Christmann A, Bomeke M, Steckel S, Bhattacharya A, Lykidis A, Overbeek R, Klenk HP, Gunsalus RP, Fritz HJ, Gottschalk G (2002) The genome of Methanosarcina mazei: evidence for lateral gene transfer between Bacteria and Archaea. J Mol Microbiol Biotechnol 4:453–461PubMedGoogle Scholar
  19. Dersch P, Isberg RR (2000) An immunoglobulin superfamily-like domain unique to the Yersinia pseudotuberculosis invasin protein is required for stimulation of bacterial uptake via integrin receptors. Infect Immun 68:2930–2938PubMedCrossRefGoogle Scholar
  20. Diaz M, Flajnik MF (1998) Evolution of somatic hypermutation and gene conversion in adaptive immunity. Immunol Rev 162:13–24PubMedCrossRefGoogle Scholar
  21. Diaz M, Velez J, Singh M, Cerny J, Flajnik MF (1999) Mutational pattern of the nurse shark antigen receptor gene (NAR) is similar to that of mammalian Ig genes and to spontaneous mutations in evolution: the translesion synthesis model of somatic hypermutation. Internat Immunol 11:825–833CrossRefGoogle Scholar
  22. Diaz M, Watson NB, Turkington G, Verkoczy LK, Klinman NR, McGregor WG (2003) Decreased frequency and highly aberrant spectrum of ultraviolet-induced mutations in the hprt gene of mouse fibroblast expressing antisense RNA to DNA polymerase zeta. Mol Cancer Res 1:836–847PubMedGoogle Scholar
  23. Di Noia JM, Neuberger MS (2004) Immunoglobulin gene conversion in chicken DT40 cells largely proceeds through an abasic site intermediate generated by excision of the uracil produced by AID-mediated deoxycytidine deamination. Eur J Immunol 34:504–508PubMedCrossRefGoogle Scholar
  24. Dorner T, Brezinschek H-P, Brezinschek RI, Foster SJ, Domiati-Saad R, Lipsky PE (1997) Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes. J Immunol 158:2779–2789PubMedGoogle Scholar
  25. Dorner T, Foster SJ, Farner NL, Lipsky PE (1998a) Somatic hypermutation of human immunoglobulin heavy chain genes: targeting of RGYW motifs on both DNA strands. Eur J Immunol 28:3384–3396CrossRefGoogle Scholar
  26. Dorner T, Foster SJ, Brezinschek H-P, Lipsky PE (1998b) Analysis of the targeting of the hypermutational machinery and the impact of subsequent selection on the distribution of nucleotide changes in human VHDJH rearrangements. Immunol Rev 162:161–171CrossRefGoogle Scholar
  27. Du Pasquier L, Zucchetti I, De Santis R (2004) Immunoglobulin superfamily receptors in protochordates: before RAG time. Immunol Rev 198:233–248PubMedCrossRefGoogle Scholar
  28. Duquette ML, Pham P, Goodman MF, Maizels N (2005) AID binds to transcription-induced structures in c-MYC, that map to regions associated with translocation and hypermutation. Oncogene 24:5791–5798PubMedCrossRefGoogle Scholar
  29. Faili A, Aoufouchi S, Flatter E, Gueranger Q, Reynaud C-A, Weill JC (2002) Induction of somatic hypermutation in immunoglobulin genes is depedent on DNA polymerase iota. Nature 419:944–947PubMedCrossRefGoogle Scholar
  30. Faili A, Aoufouchi S, Weller S, Vuillier F, Stary A, Sarasin A, Reynaud C-A, Weill JC (2004) DNA polymerase eta is involved in hypermutation occurring during immunoglobulin class switch recombination. J Exp Med 199:265–270PubMedCrossRefGoogle Scholar
  31. Gordon MS, Kanegai CM, Doerr JR, Wall R (2003) Somatic hypermutation of the B cell receptor genes B29 (Igβ, CD79b) and mb1(Igα, CD79a). Proc Natl Acad Sci USA 100:4126–4131PubMedCrossRefGoogle Scholar
  32. Halaby DM, Mornon JPE (1998) The immunoglobulin superfamily: an insight on its tissular, species, and functional diversity. J Mol Evol 46:389–400PubMedCrossRefGoogle Scholar
  33. Hoek RM, Smit AB, Frings H, Vink JM, de Jong-Brink M, Geraerts WPM (1996) A new Ig-superfamily member, molluscan defence molecule (MDM) from Lymnaea stagnalis, is down-regulated during parasitosis. Eur J Immunol 26:939–944PubMedGoogle Scholar
  34. Huang TW, Tien AC, Huang WS, Lee YC, Peng CL, Tseng CL, Kao CY, Huang CY (2004) POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics 20:3273–3276PubMedCrossRefGoogle Scholar
  35. Ito T, Chiba T, Yoshida M (2001) Exploring the protein interactome using comprehensive two-hybrid projects. Trends Biotechnol 19:S23–S27PubMedCrossRefGoogle Scholar
  36. James LC, Roversi P, Tawfik DS (2003) Antibody multispecificity mediated by conformational diversity. Science 299:1362–1367PubMedCrossRefGoogle Scholar
  37. Jing H, Takagi J, Liu JH, Lindgren S, Zhang RG, Joachimiak A, Wang JH, Springer TA (2002) Archaeal surface layer proteins contain beta propeller, PKD, and beta helix domains and are related to metazoan cell surface proteins. Structure 10:1453–1464PubMedCrossRefGoogle Scholar
  38. Kabat EA, Wu TT, Perry HM, Gottesman KS, Foeller C (1991) Sequences of proteins of immunological interest. NIH publication No. 91-3242. NIH, Bethesda, MDGoogle Scholar
  39. Kim WK, Bolser DM, Park JH (2004) Large scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20:1138–1150PubMedCrossRefGoogle Scholar
  40. Klasen M, Spillman FJX, Lorens JB, Wabl M (2005) Retroviral vectors to monitor somatic hypermutation. J Immunol Methods 300:47–62PubMedCrossRefGoogle Scholar
  41. Kubrycht J, Sigler K (1997) Animal membrane receptors and adhesive molecules. Crit Rev Biotechnol 17:123–147PubMedCrossRefGoogle Scholar
  42. Kubrycht J, Borecký J, Sigler K (2002) Sequence similarities of protein kinase peptide substrates. Comparison of their primary structures with immunoglobulin repeats. Folia Microbiol 47:319–358Google Scholar
  43. Kubrycht J, Borecky´ J, Soucˇek P, Jezˇek P (2004) Sequence similarities of protein kinase substrates and inhibitors with immunoglobulins and model immunoglobulin homologue: cell adhesion molecule from the living fossil sponge Geodia cydonium. Mapping of coherent database similarities and implications for evolution of CDR1 and hypermutation. Folia Microbiol 49:219–246Google Scholar
  44. Lee SS, Tranchina D, Ohta Y, Flajnik MF, Hsu E (2002) Hypermutation in shark immunoglobulin light chain genes results contiguous substitutions. Immunity 16:571–582PubMedCrossRefGoogle Scholar
  45. Lepš J (1996) Biostatistics. University of Southern Bohemia, Ceske Budejovice, Czech RepublicGoogle Scholar
  46. Lindstrom-Dinnetz I, Sun SC, Faye I (1995) Structure and expression of hemolin, an insect member of the immunoglobulin gene superfamily. Eur J Biochem 230:920–925PubMedCrossRefGoogle Scholar
  47. Litman GW, Anderson MK, Rast JP (1999) Evolution of antigen receptors. Annu Rev Immunol 17:109–147PubMedCrossRefGoogle Scholar
  48. Lossos IS, Levy R, Alizadeh AA (2004) AID is expressed in germinal B-cell-like and activated B-cell-like diffuse large-cell lymphomas and is not correlated with intraclonal heterogeneity. Leukemia 18:1775–1779PubMedCrossRefGoogle Scholar
  49. Malpeli G, Barbi S, Moore PS, Scardoni M, Chilosi M, Scarpa A, Menestrina F (2004) Primary mediastinal B-cell lymphoma: hypermutation of the Bcl6 gene targets motifs different from those in diffuse large B-cell and follicular lymphomas. Haematologica 89:1091–1099PubMedGoogle Scholar
  50. Marchalonis JJ, Schluter SF (1998) A stochastic model for the rapid emergence of specific vertebrate immunity incorporating horizontal transfer of systems enabling duplication and combinational diversification. J Theor Biol 193:429–444PubMedCrossRefGoogle Scholar
  51. Marchalonis JJ, Kaveri S, Lacroix-Desmazes S, Kazatchine MD (2002) Natural recognition repertoire and the evolutionary emergence of the combinatorial immune system. FASEB J 16:842–848PubMedCrossRefGoogle Scholar
  52. Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30:281–283PubMedCrossRefGoogle Scholar
  53. Morelli C, Karayianni E, Magnanini C, Mungall AJ, Thorland E, Negrini M, Smith DI, Barbanti-Brodano G (2002) Cloning and characterization of the common fragile site FRAF6F, harboring a replicative senescence gene and frequently deleted in human tumors. Oncogene 21:7266–7276PubMedCrossRefGoogle Scholar
  54. Muller WEG (1998) Origin of Metazoa: sponges as living fossils. Naturwissenschften 85:11–25CrossRefGoogle Scholar
  55. Muller WEG, Schroder HC, Skorokhod A, Bunz C, Muller IM, Grebenjuk VA (2001) Contribution of sponge genes to unravel the genome of the hypothetical ancestor of Metazoa (Urmetazoa). Gene 276:161–173PubMedCrossRefGoogle Scholar
  56. Muramatsu M, Kinoshita K, Fagarasan S, Yamada S, Schinkai Y, Honjo T (2000) Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 102:553–563PubMedCrossRefGoogle Scholar
  57. Notredame C (2003) Recent progress in multiple sequence alignments: a survey. Available at: http://www.isrec.isb-sib.ch/∼cschmid/DEA/Module5/lectures/4.2.msa_algorithms.pdf
  58. Ochman H, Lawrence JG, Groisman EA (2000) Lateral gene transfer and nature of bacterial innovation. Nature 405:299–304PubMedCrossRefGoogle Scholar
  59. Ohno S, Matsunaga T, Wallace RB (1982) Identification of 48-base-long primordial building block sequence of mouse immunoglobulin variable region genes. Proc Natl Acad Sci USA 79:1999–2002PubMedCrossRefGoogle Scholar
  60. Okazaki I-M, Hiai H, Kakazu N, Yamada S, Muramatsu M, Kinoshita K, Honjo T (2003) Constitutive expression of AID leads to tumorigenesis. J Exp Med 197:1173–1181PubMedCrossRefGoogle Scholar
  61. Oprea M, Kepler TB (1999) Genetic plasticity of V genes under somatic hypermutation: statistical analyses using a new resampling-based methodology. Genome Res 9:1294–1304PubMedCrossRefGoogle Scholar
  62. Oprea M, Cowell LG, Kepler TB (2001) The targeting of somatic hypermutation closely resembles that of meiotic mutation. J Immunol 166:892–899PubMedGoogle Scholar
  63. Pancer Z, Mayer WS, Klein J, Cooper MD (2004) Prototypic T cell receptor and CD4-like coreceptor are expressed by lymphocytes in the agnathan sea lamprey. Proc Natl Acad Sci USA 101:13273–13278PubMedCrossRefGoogle Scholar
  64. Park D, Lee S, Bolser DM, Schroeder M, Lappe M, Oh D, Bhak J (2005) Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map). Bioinformatics 21:3234–3240PubMedCrossRefGoogle Scholar
  65. Petersen-Mahrt SK, Harris RS, Neuberger MS (2002) AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversification. Nature 418:99–103PubMedCrossRefGoogle Scholar
  66. Poltoratsky VP, Wilson SH, Kunkel TA, Pavlov YI (2004) Recombinogenic phenotype of human activation-induced cytosine deaminase. J Immunol 172:4308–4313PubMedGoogle Scholar
  67. Potter M, Padlan E, Rudikoff S (1976) Localized deletion-insertion mutations: a major factor in the evolution of immunoglobulin structural variability. J Immunol 117:626–629PubMedGoogle Scholar
  68. Rada C, Yelamos J, Dean W, Milstein C (1997) The 5′ hypermutation boundary of kappa chains is independent of local and neighbouring sequences and related to the distance from the initiation of transcription. Eur J Immunol 27:3115–3120PubMedGoogle Scholar
  69. Ray JL, Nielsen KM (2005) Experimental methods for assaying natural transformation and inferring horizontal gene transfer. Methods Enzymol 395:491–520PubMedCrossRefGoogle Scholar
  70. Reuven NB, Arad G, Maor-Shoshani A, Livneh Z (1999) The mutagenesis protein UmuC is a DNA polymerase activated by UmuD, RecA, and SSB and is specialized for translesion replication. J Biol Chem 274:31763–31766PubMedCrossRefGoogle Scholar
  71. Riazuddin S, Khan SN, Ahmed ZM, Ghosh M, Caution K, Nazli S, Kabra M, Zafar AU, Chen K, Naz S, Antonellis A, Pavan WJ, Green ED, Wilcox ER, Friedman PL, Morrel RJ, Riazuddin S, Friedman TB (2006) Mutations in TRIOBP, which encodes a putative cytoskeletal-organizing protein, are associated with nonsyndromic recessive deafness. Am J Hum Genet 78:137–143PubMedCrossRefGoogle Scholar
  72. Rogozin IB, Diaz M (2004) Cutting Edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine demainase-triggered process. J Immunol 172:3382–3384PubMedGoogle Scholar
  73. Rogozin IB, Kolchanov NA (1992) Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis. Biochim Biophys Acta 1171:11–18PubMedGoogle Scholar
  74. Rossenu S, Dewitte D, Vandekerckhove J, Ampe C (1997) A phage display technique for a fast, sensitive and systematic investigation of protein-protein interactions. J Protein Chem 16:499–503PubMedCrossRefGoogle Scholar
  75. Rumfelt LL, Lohr RL, Dooley H, Flajnik MF (2004) Diversity and repertoire of IgW and IgM VH families in the newborn nurse shark. BMC Immunol 5:8PubMedCrossRefGoogle Scholar
  76. Sato A, Mayer WE, Klein J (2003) A molecule bearing an immunoglobulin-like V region of the CTX subfamily in amphioxus. Immunogenetics 55:423–427PubMedCrossRefGoogle Scholar
  77. Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and the other refinements. Nucleic Acids Res 29:2994–3005PubMedCrossRefGoogle Scholar
  78. Schatz DG (2004) Antigen receptor genes and the evolution of a recombinase. Semin Immunol 16:245–256PubMedCrossRefGoogle Scholar
  79. Schluter SF, Bernstein RM, Marchalonis JJ (1997) Molecular origins and evolution of immunoglobulin heavy-chain genes of jawed vertebrates. Immunol Today 18:543–549PubMedCrossRefGoogle Scholar
  80. Schneider TD, Stephens RM (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18:6097–6100PubMedGoogle Scholar
  81. Schutze J, Skorokhod A, Muller IM, Muller WEG (2001) Molecular evolution of the metazoan extracellular matrix: cloning and expression of structural proteins from the demosponges Suberites domuncula and Geodia cydonium. J Mol Evol 53:402–415PubMedCrossRefGoogle Scholar
  82. Shapiro GS, Ellison MC, Wysocki LJ (2003) Sequence-specific targeting of two bases on both DNA strands by the somatic hypermutation mechanism. Mol Immunol 40:287–295PubMedCrossRefGoogle Scholar
  83. Shen HM, Peters A, Baron B, Zhu X, Storb U (1998) Mutation of BCL-6 gene in normal B cells by the process of somatic hypermutation of Ig genes. Science 280:1750–1752PubMedCrossRefGoogle Scholar
  84. Sheppard DC, Yeaman MR, Welch WH, Phan QT, Fu Y, Ibrahim AS, Filler SG, Zhang M, Waring AJ, Edwards JE Jr (2004) Functional and structural diversity in the Als protein family of Candida albicans. J Biol Chem 279:30480–30489PubMedCrossRefGoogle Scholar
  85. Simhadri S, Kramata P, Zajc B, Sayer JM, Jerina DM, Hinkle DC, Wei CS (2002) Benzo[a]pyrene diol epoxide-deoxyguanosine adducts are accurately bypassed by yeast DNA polymerase zeta in vitro. Mutat Res 508:137–145PubMedGoogle Scholar
  86. Simossis VA, Kleinjung J, Heringa J (2005) Homology-extended sequence alignment. Nucleic Acids Res 33:816–824PubMedCrossRefGoogle Scholar
  87. Simpson LJ, Sale JE (2003) Rev1 is essential for DNA damage tolerance and non-templated immunoglobulin gene mutation in a vertebrate cell line. EMBO J 22:1654–1664PubMedCrossRefGoogle Scholar
  88. Sitnikova T, Su C (1998) Coevolution of immunoglobulin heavy- and light-chain variable-region gene families. Mol Biol Evol 15:617–625PubMedGoogle Scholar
  89. Smith GP, (1985) Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228:1315–1317PubMedCrossRefGoogle Scholar
  90. Suzuki T, Shin-I T, Kohara Y, Kasahara M (2004) Transriptome analysis of hagfish leukocytes: a framework for understanding the immune system of jawless fishes. Dev Comp Immunol 28:993–1003PubMedCrossRefGoogle Scholar
  91. Teichmann SA, Chothia C (2000) Immunoglobulin superfamily proteins in Caenorhabditis elegans. J Mol Biol 296:1367–1383PubMedCrossRefGoogle Scholar
  92. Tilson MD, Rzhetsky A (2000) A novel hypothesis regarding the evolutionary origins of the immunoglobulin fold. Curr Med Res Opin 16:88–93PubMedCrossRefGoogle Scholar
  93. van den Berg TK, Yoder JA, Litman GW (2004) On the origins of adaptive immunity: innate immune receptors join the tale. Trends Immunol 25:11–16PubMedCrossRefGoogle Scholar
  94. Vogel C, Teichmann SA, Chothia C (2003) The immunoglobulin superfamily in Drosophila melanogaster and Caenorhabditis elegans and the evolution of complexity. Development 130:6317–6328PubMedCrossRefGoogle Scholar
  95. Washington MT, Johnson RE, Prakash L, Prakash S (2003) The mechanism of nucleotide incorporation by human DNA polymerase eta differs from that of the yeast enzyme. Mol Cell Biol 23:8316–8322PubMedCrossRefGoogle Scholar
  96. Wedekind JE, Dance GSC, Sowden MP, Smith HC (2003) Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet 19:207–216PubMedCrossRefGoogle Scholar
  97. Wiens M, Mangoni A, D’Esposito M, Fattorusso E, Korchagina N, Schroder HC, Grebenjuk VA, Krasko A, Batel R, Muller IM, Muller WEG (2003) The molecular basis for the evolution of the metazoan bodyplan: extracellular matrix-mediated morphogenesis in marine demosponges. J Mol Evol 57:S60–S75PubMedCrossRefGoogle Scholar
  98. Williams AF, Barclay AN (1988) The immunoglobulin superfamily-domains for cell surface recognition. Annu Rev Immunol 6:381–405PubMedGoogle Scholar
  99. Wojciechowicz D, Lu CF, Kurjan J, Lipke PN (1993) Cell surface anchorage and ligand-binding domains of the Saccharomyces cerevisae cell adhesion protein alpha-agglutinin, a member of the immunoglobulin superfamily. Mol Cell Biol 13:2554–2563PubMedGoogle Scholar
  100. Wright BE, Schmidt KH, Minnick MF (2004) Mechanism by which transcription can regulate somatic hypermutation. Genes Immun 5:176–182PubMedCrossRefGoogle Scholar
  101. Yang C, Carlow D, Wolfenden R, Short SA (1992) Cloning and nucleotide sequence of the Escherichia coli cytidine deaminase (ccd) gene. Biochemistry 31:4168–4174PubMedCrossRefGoogle Scholar
  102. Yang W (2005) Portraits of a Y-family DNA polymerase. FEBS Lett 579:868–872PubMedCrossRefGoogle Scholar
  103. Yu XQ, Kanost MR (2002) Binding of hemolin to bacterial lipopolysaccharide and lipoteichoic acid. An immunoglobulin superfamily member from insects as a pattern-recognition receptor. Eur J Biochem 269:1827–1834PubMedCrossRefGoogle Scholar
  104. Zarrin AA, Alt FW, Chaudhuri J, Stokes N, Kaushal D, Du Pasquier L, Tian M (2004) An evolutionarily conserved target for immunoglobulin class-switch recombination. Nat Immunol 5:1275–1281PubMedCrossRefGoogle Scholar
  105. Zeng X, Negrete GA, Kasmer C, Yang WW, Gearhart PJ (2004) Absence of DNA polymerase eta reveals targeting of C mutations on the nontranscribed strand in immunoglobulin switch regions. J Exp Med 199:917–924PubMedCrossRefGoogle Scholar
  106. Zhang WL, Kohler B, Oswald E, Beutin L, Karch H, Morabito S, Caprioli A, Sauerbaum S, Schmidt H (2002) Genetic diversity of intimin genes of attaching and effacing Escherichia coli strains. J Clin Microbiol 40:4486–4492PubMedCrossRefGoogle Scholar
  107. Zhang Z, Schaffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF (1998) Protein sequence similarity searches using pattern as seeds. Nucleic Acids Res 26:3986–3990PubMedCrossRefGoogle Scholar
  108. Zvárová J (2001) Biomedical statistics. I. The fundamentals of statistics for biomedical fields. Karolinum, Prague, Czech RepublicGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Jaroslav Kubrycht
    • 1
  • Karel Sigler
    • 2
  • Michal Růžička
    • 3
  • Pavel Souček
    • 1
  • Jiří Borecký
    • 4
  • Petr Ježek
    • 3
  1. 1.Center of Occupational MedicineNational Institute of Public HealthPragueCzech Republic
  2. 2.Institute of MicrobiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
  3. 3.Institute of PhysiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
  4. 4.Centro de Biologia Molecular e EngenhariaUniversidade Estadual de CampinasSão Paulo

Personalised recommendations