The CD33-related sialic acid binding Ig-like lectins (CD33rSiglecs) are predominantly inhibitory receptors expressed on leukocytes. They are distinguishable from conserved Siglecs, such as Sialoadhesin and MAG, by their rapid evolution. A comparison of the CD33rSiglec gene cluster in different mammalian species showed that it can be divided into subclusters, A and B. The two subclusters, inverted in relation to each other, each encode a set of CD33rSiglec genes arranged head-to-tail. Two regions of strong correspondence provided evidence for a large-scale inverse duplication, encompassing the framework CEACAM-18 (CE18) and ATPBD3 (ATB3) genes that seeded the mammalian CD33rSiglec cluster. Phylogenetic analysis was consistent with the predicted inversion. Rodents appear to have undergone wholesale loss of CD33rSiglec genes after the inverse duplication. In contrast, CD33rSiglecs expanded in primates and many are now pseudogenes with features consistent with activating receptors. In contrast to mammals, the fish CD33rSiglecs clusters show no evidence of an inverse duplication. They display greater variation in cluster size and structure than mammals. The close arrangement of other Siglecs and CD33rSiglecs in fish is consistent with a common ancestral region for Siglecs. Expansion of mammalian CD33rSiglecs appears to have followed a large inverse duplication of a smaller primordial cluster over 180 million years ago, prior to eutherian/marsupial divergence. Inverse duplications in general could potentially have a stabilizing effect in maintaining the size and structure of large gene clusters, facilitating the rapid evolution of immune gene families.
Fig. S1Siglec EnsEMBL gene and protein ID. EnsEMBL gene and protein ID as well as chromosome/group/scaffold number are listed for genes described. Versions of the published assemblies used are included in the last page (DOC 29 kb)
Fig. S2Human CD33rSiglec gene cluster 2D dot-plot analysis. Sequence of the whole human CD33rSiglec gene cluster is plotted on both x and y axes. For abbreviations and gene annotations, see Fig. 1. The diagonal line shows one-to-one correspondence between sequences on the x and y axes. Additional lines indicate regions of similarity between the two sequences. Perpendicular lines indicated regions likely created by an inverse duplication whereas parallel lines show regions likely created by tandem duplication. Perpendicular lines corresponding to CD33rSiglec genes form two distinct sets as shown by the circles (DOC 778 kb)
Fig. S3Rhesus versus dog CD33rSiglec gene cluster dot-plot analysis. Sequence corresponding to the whole rhesus macaque CD33rSiglec gene cluster (x-axis) is plotted against that of the dog CD33rSiglec gene cluster (y-axis). For abbreviations and gene annotations, see Fig. 1. Perpendicular lines (PL1 and PL2) like those found in Fig. 2a, b that appear on one side of the fragmented diagonal lines and correspond to the rhesus subcluster B and the dog subcluster A. PL1 corresponds to Siglec-13 and CE18 in rhesus macaque subcluster B and the Siglec-9, -P3 and CE18 in the dog subcluster A. PL2 corresponds to the conserved ATB3 of the dog subcluster A and an unannotated region between rhesus macaque’s ZNF175 and Siglec-5 (DOC 592 kb)