Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)
- First Online:
The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cir s and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.
The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.
In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion.
An important feature of the malaria parasite is the export of proteins to the surface of infected red blood cells (iRBCs). Surface proteins or variant surface antigens (VSA) have been identified so far in three species infecting humans: Plasmodium falciparum, P. vivax and P. knowlesi, as well as in the rodent malaria parasites P. chabaudi and P. yoelii [1, 2, 3, 4, 5]. These proteins are implicated in antigenic variation and immune evasion, as well as parasite accumulation or sequestration in host tissues; features which may be critical in determining the outcome of malaria reviewed by [6, 7]. Since VSAs are recognized by antibodies, they are also likely to be important targets for a protective immune response eg. [8, 9].
In most Plasmodium species VSAs are encoded by multi-gene families, usually located in sub-telomeric chromosome regions [2, 10, 11, 12]. In P. falciparum, several VSA gene families have been identified such as var, rif, stevor and surf reviewed by . Among them, the best characterized is the var gene family encoding Pf EMP1 proteins, which is implicated in both antigenic variation and sequestration [1, 13]. This family consists of 60 genes that can be grouped into several families (A-E) according to chromosomal location, coding and non-intergenic sequences, direction of transcription, and domain arrangements [14, 15]. Similarly, the rif genes, the largest multi-copy gene family in P. falciparum (circa 150-200 genes), has been divided into subgroups A and B on the basis of sequence similarity [16, 17]. A- and B-type RIFINs have different sub-cellular localizations, in that only the A-type RIFINs appear to be exported towards the surface of iRBCs . Thus, A-type RIFINs may be more likely to play a role in the host/parasite relationship during the blood stages of P. falciparum.
Plasmodium interspersed repeat (pir) genes have been identified in Plasmodium vivax (vir), P. knowlesi (kir), P. berghei (bir), P. chabaudi (cir), and P. yoelii (yir) [2, 18, 19]. Together, pirs form the largest multi-gene family identified to date in Plasmodium species, and occupy up to 5% of the parasite genome. On completion of the P. vivax Salvador I genome sequence, a total of 12 vir sub-families were identified [20, 21, 22]. Similarly, 5 sub-families have been identified within the yir repertoire . However, so far no function has been ascribed to the subgroups of yir and vir families.
Microarray studies carried out on P. vivax and P. yoelii suggest that approximately 50% of the yir and vir repertoires are expressed in a population of iRBCs during infection, with no evidence of preferential transcription according to their chromosomal location or phylogenetic sub-groups [2, 4, 20, 24]. Surprisingly, considering the large proportion of yir genes transcribed in the parasite population, only 1-3 yirs are transcribed in a single iRBC . This suggests that transcription of these genes is under tight control. The function(s) of PIR proteins, however, remain unknown.
Rodent malarias offer the only feasible system in which the contribution of PIR proteins to immune evasion and to sequestration and pathology can be examined. Of these, P. chabaudi is the only rodent species that naturally produces a chronic infection in mice. These chronic infections are likely to be perpetuated by antigenic variation [25, 26, 27]; however, the antigens involved have not yet been determined. P. chabaudi also exhibits other important features observed in human Plasmodium infections, including rosetting and adhesion to host endothelial cells [5, 28], and thus is an ideal model in which to investigate the role of PIR proteins.
The cir genes comprise the largest gene family in P. chabaudi, and are located in sub-telomeric regions of chromosomes [10, 29]. Recently, restriction fragment length polymorphism (RFLP) analysis indicated that cir genes may have tissue specific patterns of expression . However, as such techniques are not gene specific, they are only able to provide clues about general cir expression. Furthermore, very little is known about the function of CIRs in stimulating or evading host immunity during P. chabaudi infection.
Here we describe two major sub-families within the cir repertoire of the recently completed P. chabaudi AS genome. Using an approach similar to that applied to the RIFIN repertoire , we found both conserved and sub-family specific amino acid motifs, and predict functional divergence between the proteins from different CIR sub-families.
We have analyzed in detail the expression of the entire cir repertoire via RNA sequencing (Illumina RNA-seq) and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified, with some differences in the pattern of expression between the cir subgroups. Finally, we found specific cir genes were expressed at different time points during asexual blood stages. Together these data will allow future investigation of the CIR family to elucidate their roles in the host/pathogen relationship.
This study was carried out in strict accordance with the UK Animals (Scientific Procedures) Act 1986 and was approved by the Ethical Committee of the MRC National Institute for Medical Research, and the British Home Office (PPL: 80/2538).
Annotation of cir genes
cir genes were annotated using Artemis release 11 [31, 32] onto the eight-fold coverage assembly of the Plasmodium chabaudi chabaudi AS genome . Putative coding sequences containing conserved features of previously published cir genes identified from the three-fold coverage genome assembly . These were then searched for similarity to the pir superfamily genes via a combination of basic local alignment search tool (BLAST) and Hidden Markov Model (HMM) on PIR super-family proteins [CIR_BIR_YIR (PF06022)], available in the PFAM database , following previously described methods [35, 36].
Detection of conserved motifs
Motif Elicitation analysis (MEME, ) was used to identify up to 20 conserved amino acid motifs within the CIR repertoire. The average motif locations were identified and plotted onto each gene. WebLogos were generated with the MEME program, using an adaptation of the WebLogo software .
Analysis of sequence similarity
Amino acid sequences of 183 cir genes were aligned using the MUltiple Sequence Comparison by Log- Expectation algorithm (Muscle, ). Sequences aligning poorly with the other CIRs were excluded from the alignment and regions containing large insertions were deleted (Additional file 1). The sequence similarity along the alignment is attached in Additional file 2.
Since phylogenetic trees only represent a bifurcating lineage and imply phylogenetic relationships, reticulate networks were created within the program Splitstree 4.0 . All networks used the algorithms NeighborNet  for calculation of distances and Equal angle  for calculation of splits. 500 bootstrap replicates were generated (Additional file 3).
To support the network, a phylogenetic tree was also constructed using the Maximum Likelihood method from the PhyML server . Here, the evolutionary model applied was Le Gascuel (LG) , and the branch support was calculated by approximate likelihood ratio test (aLRT) . Three YIR  and three BIR sequences (Ulrike Böhme, WTSI, personal communication) were added to the CIR alignment to allow a root to be placed within this tree (Additional file 4).
Clades identified by both methods with high branch support values contained highly similar CIR sequences. Small clades within each major sub-family were denoted numerically: A1-A5 and B1-B4. Members of each clade are tabulated in Additional file 5.
Detection of phylogenetic incompatibilities between cir genes
Phylogenetic incompatibilities within the alignment of 183 CIRs, and each identified sub-family, were analyzed using the pairwise homoplasy index (PHI) in Splitstree v4.0 .
Phylogenetic profiling was used to detect phylogenetic inconsistencies between four cir DNA sequences, selected at random, using a hidden Markov model method within the TOPALi platform v2.5 [47, 48]. The probability of generating each of the three possible tree topologies for the four sequences was modelled in a given 100 nucleotide window. Possible recombination breakpoints were identified where the most probable topology altered at different positions along the alignment.
Five cir quartets, chosen at random, were analyzed per clade by each method described above (Additional file 6).
Function shift analysis
The alignment of 183 CIRs was split into two files containing only A- and B-type CIRs. The two alignments were then used to apply the 'FunShift' methodology , and predict whether these 2 groups of proteins may perform different functions. Positions containing only gaps in a subfamily were not counted.
The method used was the same as described previously by Abhiman and Sonnhammer  with a few modifications. Rate-Shifting Sites (RSS) were defined as positions conserved in one sub-family but variable in the other, and were identified using the likelihood ratio test (LRT) program . The U-values generated by this program indicate the likelihood of rate change for each position in the alignment between the two sub-families. U-values above 4.0 were considered significant at the 5% significance level, as previously described .
Conservation-Shifting Sites (CSS) were defined as positions that were conserved in both groups, but containing different residues in each. CSS were detected using the method developed by Abhiman and Sonnhammer . This calculates a Z-score based on the normalized cumulative relative entropy at each position of the alignment, between the two sub-families. Z-scores exceeding 0.5 per alignment position were considered significant .
CSS and RSS are plotted within a sub-section of the alignment in Additional file 7.
Mice and parasites
Female BALB/c and C57BL/6 mice aged 6-8 weeks were obtained from the specific pathogen-free unit at the MRC National Institute for Medical Research (NIMR), London. For experimental purposes, mice were housed conventionally with sterile bedding, food and irradiated water on a 12 hour light-dark cycle.
A cloned line of Plasmodium chabaudi chabaudi (AS) was used in this study . Stabilates were cryo-preserved in blood from BALB/c mice. To obtain parasites for experimental infection, an aliquot of the stabilate was injected intraperitoneally (i.p.) into immunodeficient BALB/c RAG2-/- mice . Blood was taken from the donor mice 7 days after infection and experimental mice were infected by injecting 105 infected erythrocytes i.p. Parasitaemia was monitored by examination of Giemsa-stained blood films as previously described .
Blood was collected from each mouse by cardiac puncture under terminal anaesthesia into Krebs saline (114 mM NaCl, 4.57 mM KCl, 1.15 mM MgSO4) containing 0.2% glucose and 25 U/ml heparin (Leo Pharmaceuticals) 7 days after infection. Leukocytes were removed via Plasmodipur filtration (Euro-Diagnostica) according to manufacturer's instructions. Blood was then stored at -80°C in TRIZOL reagent (Invitrogen) for subsequent RNA extraction.
RNA was extracted from P. chabaudi infected blood samples by guanidinium thiocyanate-phenol-chloroform extraction according to standard methods  and DNase digested using Turbo DNAse (Ambion) according to the manufacturer's instructions.
Microarray hybridization and analysis
P. chabaudi AS is a highly synchronous parasite for which development in the blood follows its host's circadian rhythm. Twelve time-points were then collected; one every two hours, to cover the entire 24 h cycle of blood stage development. At the peak of parasitaemia, one mouse was sacrificed at each time point and thin blood films were made and stained with Giemsa for optical microscopy. The pan-rodent microarray was designed using the OligoRankPick program as previously described . The RNA preparation, Cy-dye coupling to cDNA, hybridization and slide scanning were performed as described by Bozdech and colleagues .
Data processing and analysis (including the Fast Fourier Transform) were carried out as described by Bozdech and colleagues . The phaseogram (Figure 7) contains genes with > 1.7 log2 ratio of change in mRNA abundance across the IDC. A list of identified genes can be found in Additional file 8.
10 μg of P. chabaudi total RNA obtained from BALB/c and C57BL/6 mice was used for this analysis. RNA sequencing was performed using an Illumina GAIIx following the methodology described by Otto and colleagues  and was used to create 76 bp paired-end reads. TopHat  was used to map reads against the Plasmodium chabaudi chabaudi AS reference genome , with maximum intron size set to 10000 and inner-mate distance set to 100. Gene expression levels (RPKM) were calculated as defined by Mortazavi and colleagues  with minor modifications. Non-uniquely mapping reads were excluded and read-length windows of protein-coding regions that were non-unique were excluded from the gene length term used in calculating RPKM.
Each sequencing run contained different amounts of noise (eg. reads mapping to generally unexpressed parts of the genome: introns and intergenic regions). Therefore we calculated RPKMs over 500 bp windows of exonic and intronic sequences on chromosome 14. For each sequencing run, we took an RPKM cutoff above which only 10% of intron sequences were expressed. A list of identified genes can be found in Additional file 9, with the threshold calculation in Additional file 10. Legends for Additional files 1-10 can be found in Additional file 11.
A Kolmogorov-Smirnov test  was used to compare the distribution of cir sub-groups according to their level of expression.
I) Bioinformatic analysis of the CIR multi-gene family
a) Identification of cir genes
Completion of the P. chabaudi AS genome sequencing and revised assembly allowed the initial cir annotation to be manually revisited. Conserved features from the initially identified cir genes [18, 29], such as the relative exon lengths, splice sites and amino acid sequences were used to identify putative cir coding sequences in the assembled contigs. 117 cir genes were first identified and manually annotated. A hidden Markov model (HMM) was then constructed on the basis of the identified cir repertoire and was used to detect more divergent cir genes, bringing the total of identified cirs to 196 , including 3 additional cir-like genes, containing some but not all expected features of cir family members.
Only three partial cir genes were identified, arising from the few unresolved contig assemblies for the P. chabaudi AS genome. Three long cir genes were also identified, containing an extended first exon. The majority of cir genes contained one predicted TM domain including some of the divergent cir s.
b) Sequence similarity of CIR proteins
In order to investigate similarity between CIRs, the amino acid sequences identified during cir gene annotation were aligned using Muscle , and refined manually (Additional file 1a). Upon addition of more divergent cir genes to the repertoire, 16 sequences aligned poorly with the majority of CIRs and were excluded from the analysis (tabulated in Additional file 1b). These included the three partial CIRs, eight CIR sequences with low C- or N-terminal similarity to other members of the repertoire, two CIRs encoded by genes with an atypical structure and three sequences which had been identified as cir-like genes, containing some but not all features of cir genes.
Figure 1 shows two distinct sub-families, each of which comprises multiple smaller clades, the members of which are shown in Additional file 5. The large sub-family A contained more divergent CIR sequences, many of which were identified by the CIR HMM. Five clades containing more than five CIR sequences were present within this sub-family: A1-A5 (highlighted in red, Figure 1), of which clade A1 was the most distinct. Sub-family B was comprised of the most conserved CIR sequences, and could also be further sub-divided into 4 clades: B1-B4 (highlighted in blue, Figure 1). The definition of the A1 group as a part of sub-family A was further reinforced by the creation of a phylogenetic tree to support the clades identified by the network in Figure 1. The Maximum Likelihood tree contained the same sub-families and smaller clades as observed in Figure 1, even after three YIR and three BIR sequences (, Böhme et al., unpublished) were added to the CIR alignment to enable a root to be placed (Additional file 4 TreeBase study accession URL http://purl.org/phylo/treebase/phylows/study/TB2:S12458 [65, 66, 67]). The clade A1 continued to cluster within CIR sub-family A, indicating that these CIR sequences were more similar to each other than to members of CIR sub-family B.
In addition, the visualization of CIR sequence relationships using network methodology indicated phylogenetic inconsistencies within the cir repertoire, as many box-like structures (reticulations) were present (Figure 1). Across the whole cir repertoire, different regions of cir genes were found to display different phylogenetic relationships with each other (Additional file 6). Such phylogenetic incompatabilities may arise from either data ambiguity, convergent evolution or recombination events , suggesting that recombination may have played a strong role in shaping the cir repertoire.
c) Identification of conserved amino acid motifs
Motifs 1 and 3 were the most conserved, being present in 98.66% of CIR sequences. Motif 1 contained the almost unanimous YK residues, corresponding to the start of the third cir exon and part of the predicted CIR transmembrane domain. Motif 3 was found within the second cir exon and contained the highly conserved sequence YAILWLSY. Motif 10, present in 91.28% of CIRs, contained some degeneracy, but a clear conserved methionine indicated the N-terminus of all CIR proteins. All CIRs possessing motif 10 also contained a cysteine six residues into the sequence. Conserved cysteine residues were also located within motifs 5, 7 and 14. Motif 10 was found at the N-terminus of almost all CIR proteins (91%). Motifs 9, 3, 11, 5, 4 and 1 were also found in members of all sub-families.
The remaining motifs appeared to have sub-family specific distributions, with motifs 2, 6 and 7 only found within sub-family B sequences. By contrast, motifs 8, 12, 13 and 14 were specific to members of CIR sub-family A. The arrangement of motifs within the amino acid sequence is indicated for a typical member of each major sub-family in Figure 2b.
Part of motif 8 (unique to sub-family A) lies within the predicted TM domain and the TM domains of A-type CIRs contain more proline residues than B-type CIRs. In addition the majority of CIRs containing more than one TM domain are found within sub-family A. Together, these differences indicate that the TM domain of CIR sub-type A proteins may have altered properties compared to the rest of the CIRs.
d) Similarities between CIRs and RIFINS and function shift analysis
The identification of two major CIR sub-families in Figure 1 resembled the P. falciparum RIFIN repertoire organization [16, 17]. As the rif and stevor multi-gene families have been suggested to be distantly related to the pir genes , we compared the CIR and RIFIN repertoires.
The RIFIN sub-family A is defined by the presence of a 25 amino acid sequence, which is absent from RIFIN-B types [16, 17]. An insertion sequence could be detected only in members of CIR sub-family A, which was most conserved in the clade A1. This was located between position 253 and 316 of the alignment of 183 CIRs (Additional file 7a). Furthermore, in this insertion (displayed as a weblogo, ), showed some similarities with the A-type RIFIN insertion sequence (Additional file 6b and c, [16, 17]). Approximately a third of the residues in each insertion sequence were hydrophobic and very few basic residues were present. Notably, both sequences included two conserved cysteine residues.
Highly significant rate and conservation shifting sites identified between the two major CIR sub-families
Position in alignment
Residues in CIR sub-family:
Significant conservation shifting sites (CSS):
G W S R H Y I N L Q D
I V L F S
C S W K L F Y I Q
F L I N
N K S G D
S L F
I A V T S M N Y K H L
C F R Y L
Y E R G V K A I R
P S A L T
S T K A L V P Y F I
W R Q S I G C
R T K G A V N
D G H
K E M N T Y
K E F T N I
Q H K R N
T A I K V
Significant rate shifting sites (RSS):
M D K T N I L
V N K G A
T Y D V S E K N Q P G I
E K T A S Q P I S D R L Y G N
E P F L T
I Q D E N P A S G H L K V T R
S E D G Q R
I P S V T N A K E Q L C Y H G
S F N T P L
I V L D
L R V I P
II) Analysis of cir gene expression during plasmodium chabaudi infection
a) RNA sequencing analysis
In addition, mRNA was also sequenced from parasites passaged in two C57BL/6 mice. Both the BALB/c and C57BL/6 inbred strains of mice are commonly used hosts for P. chabaudi infection (for example: [68, 69, 70, 71]), thus it was important to determine whether either host genetic background could influence cir expression. While in the C57BL/6 samples only 17% and 12% of the cir repertoire were expressed above background, these lower values were at least partly the result of a higher RPKM cutoff due to relatively high expression of annotated introns and thus probably do not reflect differences in cir expression between BALB/c and C57BL/6 mice.
A list of the ten most highly expressed cir genes was established for each sample (Figure 4b). As their expression was most highly detected in a population of parasites (either due to higher levels of expression in selected iRBCs or expression in a higher number of iRBCs), these will henceforth be referred to as "dominant cir s". When we compared the lists of dominant cir s thus established, it was evident that most of these genes (> 60%) were different between the samples analyzed (Figure 4c). This indicates that the relative level of cir transcripts may vary from one mouse to another; which could reflect antigenic variation, immune selection of iRBCs or sampling differences upon infection of the mice. Whilst the overall level of cir expression was significantly lower than for other parasite genes (Kolmogorov-Smirnov test; D = 0.6742, P-value < 2.2e-16, Figure 4d), the dominant cir transcripts were found amongst the most highly expressed parasite genes.
Surprisingly, the same gene (PCHAS_110030) was expressed more than ten fold higher than any other cir genes in all the BALB/c samples analyzed. Similarly, three other cir s (PCHAS_140140, PCHAS_070130, and PCHAS_000430) were also dominant in these four samples. Furthermore, all of these genes, except PCHAS_140140, were dominant in at least one of the two C57BL/6 mice studied (Additional file 9). This indicates that some cir genes are consistently highly represented in a population of Plasmodium chabaudi AS parasites at the late trophozoite stage during the peak of parasitemia (8 days post infection).
However, the most highly expressed cir s were all members of sub-family A (Figure 5b). Indeed, PCHAS_110030, the most highly expressed cir gene in the six samples analyzed, belonged to this sub-family. We investigated whether this was also the case for the other dominant genes identified previously, Figure 4b. However, in each sample analyzed, about half of the dominant cir genes identified belonged to each sub-family (on average, 52.5% ± 18.9% of the dominant genes were A-type cir s and 47.5% ± 18.9% were members of sub-family B, Figure 5c, indicated by black dots in Additional file 3). This observation was confirmed in the C57BL/6 mice (on average, 50% ± 14.14% of the dominant cir s belonged to sub-family A and 50% ± 14.14% were B-type cir s).
b) Microarray analysis
Timing of transcription could be an alternative mechanism by which CIRs of different subgroups mediate different functions, therefore we investigated the transcriptional pattern of these genes throughout the intra-erythrocytic developmental cycle (IDC) in P. chabaudi. For this purpose parasites were collected at 2-hour intervals from mice with a synchronous infection of P. chabaudi for a total of 24 hrs, representing one complete IDC of the parasite. The RNA was extracted and cir transcription was determined using the pan-rodent Plasmodium spp microarray, which has been described previously .
Differences in absolute levels of transcription of the two cir sub-families could not be quantified using this approach as two-channel hybridization only allows relative abundance comparison for each gene and not between different genes. However, many members of each sub-family could be detected at every time point. These data also strongly suggest that different cir s are indeed transcribed at different stages of parasite development in the erythrocyte in line with their products having different functions.
Interestingly, several of the dominant cir transcripts from the RNA sequencing experiments were also detected in this microarray timecourse, despite the fact that this material was obtained from separate P. chabaudi infections. These cir s were: PCHAS_110030, PCHAS_140140, PCHAS_110020, PCHAS_001090, PCHAS_070130, PCHAS_000110 and PCHAS_000120, which comprised approximately a third of the dominant cir s identified by RNA sequencing analysis. In addition, as each time point represents the material from a single mouse, the progression of cir transcription appeared to be conserved in the mice that were infected with a single starting parasite population and would suggest that limited switching occurs within a single cycle of asexual development. Importantly, the peak timing of expression of these cirs detected by microarray is consistent with the parasite stage used for the RNA sequencing analysis, further supporting the notion that the timing of expression of different CIR is conserved in different independent infections.
In total, 196 cir genes have been identified and annotated in the P. chabaudi genome. The cir genes distribute into two major groups, according to sequence similarity. Many cir s from both sub-families are transcribed during P. chabaudi infection, although dominant cir transcripts are present. Differences between the sub-families, such as sub-family specific amino acid motifs and altered residue conservation, particularly within the predicted TM domain, indicate that the two sub-families could be functionally divergent. Recombination of cir genes may have occurred within the cir repertoire, consistent with that observed in other Plasmodium multi-gene families including var and sicavar (for example: [63, 64]).
Similar to the VIR and YIR gene families [2, 20, 22, 23], we confirm that there are also sub-groups within the CIR repertoire in agreement with Ebbinghaus and Krucken . As each of the major groups defined here represented almost half of the CIRs, we defined these as the sub-families: A, comprising the more divergent CIRs; and B, containing more conserved CIRs. These CIR sub-families could each be further sub-divided into smaller clades. Despite the different analysis of CIRs recently described , the same groups could be identified within our network (as clade A1 and sub-family B), indicating that the sequences in each of these groups are indeed highly similar to each other. While the overall conservation of PIRs across species is relatively low, some evidence exists for sub-families being shared between P. yoelii and P. berghei , as well as P. chabaudi, P. yoelii and P. berghei  suggesting common functions.
The extensive sequence variation often seen in multi-gene families like var, rif and stevor in P. falciparum, or the pir s in P. vivax and rodent malaria parasites, is thought to reflect that these genes are under immune pressure (eg. [2, 29, 72, 73, 74]). In our analysis however, we have identified 14 conserved amino acid motifs, suggesting that some parts of the CIR sequences are under constraints important for the biological functions of these proteins. While some motifs were found in the majority of CIRs, motifs unique to each of the two sub-families were also detected. This divergence of conserved amino acid sequence motifs in the different sub-groups presents strong evidence that the different CIR sub-types may indeed carry out distinct but related functions in the parasite, as suggested also for PIR and RIFIN sequences . Since the CIR A sub-family contained the most divergent CIRs, these proteins may be exposed to stronger selective pressure, and it is possible that their diverse nature enables immune evasion in vivo.
In line with the functional diversification of CIR subtypes A and B, some CIRs belonging to sub-type A possessed a unique amino acid motif (motif 8) at the beginning of the predicted TM domain. This, in addition to different conservation of proline residues and other amino acids within the predicted TM domains of each sub-family, suggests that CIR sub-family A proteins may have a functionally different TM domain than the B-type CIRs, perhaps altering which membrane the proteins are able to span.
Janssen and colleagues have predicted that rif and stevor could be ancestral to the pir genes due to similarities in terms of size, amino acid motifs and conservation of the first intron . The observations that the CIR repertoire could be separated into two major sub-types namely thanks to an insertion within CIR sub-type A, a section of which contained two highly conserved cysteine residues [similar to RIFIN sub-type A, [16, 17]], is strong evidence that the CIR family displays further parallels to the RIFINs. The variation we observed in the number and position of cysteine residues in the two CIR sub-families could significantly impact on the overall protein tertiary structure, and thereby function. As predicted for the A and B type RIFINs [17, 49], we found high proportions of rate and conservation shifting sites between the two major CIR sub-families, providing further support for the idea of functional divergence.
Transcriptional differences in both in amount as well as timing can indicate functional differences between related genes. Until now, only Northern blot, degenerate PCR and restriction fragment length polymorphism (RFLP) analyses have been used to give indications of cir gene expression during P. chabaudi infection [29, 30]. Here, we have used both mRNA sequencing as well as microarray studies to obtain a detailed picture of cir transcription in vivo. The expression of up to 40% of the cir genes could be detected during an infection. Different cir s were transcribed at different times during the IDC of the parasite, with distinct cir s being transcribed in ring, trophozoite and schizont stages. The timing of transcription for a specific cir appeared to be conserved from one animal to another (as seen in the microarray experiments) as well as across different experiments (as seen with the good overlap between the genes transcribed in the trophozoite stage as detected by both RNA sequencing and microarray), suggesting that CIRs may carry out distinct ring-, trophozoite-or schizont-specific functions. This is further supported by the observation that pir transcription follows a similar pattern in both P. vivax and P. yoelii, where 59% of the vir s and 42% of the yir s were expressed respectively [4, 24].
Whilst most cir genes were transcribed at lower levels than other parasite genes, this most likely reflects differences in the proportion of parasites that actually express a particular cir, resulting in populations of iRBCs each expressing different cir genes. This would be consistent with observations made for vir s, yir s and var s [1, 4, 75], that individual iRBCs transcribe only one or a small number of these genes. It is therefore of particular interest that three cir genes were represented at relatively high levels in all six samples analyzed by RNA sequencing (PCHAS_110030, PCHAS_070130, and PCHAS_000430), with PCHAS_110030 being detected at a level more than 10 fold higher than any other cir in all the samples analyzed. In a classic model of antigenic variation, it would be expected that a single or relatively small number of a gene family would be transcribed in any population, with other members of the gene family being transcribed at much lower levels. This is exactly what we have detected during P. chabaudi infection.
While our data suggest that cir s belonging to sub-family A are more highly transcribed than those belonging to sub-family B, these differences are relatively small. At this stage there it is not clear whether there is a preference for the transcription of A or B sub-families. What is evident though, is that both A and B sub-family members were transcribed at the peak of a blood-stage P. chabaudi infection. Considering that the sequence analysis carried out here strongly suggests functional differences between the two sub-types, the expression of both sub-families may indicate that A and B-type CIRs provide complementary but non-overlapping roles at different stages during infection.
We have here presented a thorough analysis of the cir repertoire, highlighting similarities between these genes and other multi-gene families. The two CIR sub-families have been predicted to have different functions, as shown for the RIFINs of P. falciparum. Whilst these function(s) remain unknown, the large number of cir genes and their expression throughout the IDC indicates that CIR proteins are likely to play key roles in the biology of the parasite. These may include antigenic variation and immune evasion. Our application of RNA sequencing during P. chabaudi infection has enabled the detection of dominant cir transcripts for the first time, supporting such roles. Further application of these methodologies may elucidate the functions of CIR proteins and help to clarify the roles that members of the CIR sub-families play during infection.
We are grateful to the PFAM group at the Wellcome Trust Sanger Institute for creating a HMM based on the cir genes, and to Saraswathi Abhiman and Erik L. Sonnhammer for allowing us to use the Funshift program in our analysis .
This work was supported by the Medical Research Council, UK (file reference U117584248); Singapore A*Star-UK MRC Joint Grant (A*Star reference 10/1/22/24/630); Singapore Nanyang Technological University (WBS N° M47080081); and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement N° 242095. Jennifer Lawton was in receipt of an MRC PhD scholarship.
- 1.Joergensen L, Bengtsson DC, Bengtsson A, Ronander E, Berger SS, Turner L, Dalgaard MB, Cham GK, Victor ME, Lavstsen T: Surface co-expression of two different Pf EMP1 antigens on single plasmodium falciparum-infected erythrocytes facilitates binding to ICAM1 and PECAM1. PLoS Pathogens. 2010, 6 (9): e1001083-10.1371/journal.ppat.1001083.PubMedCentralCrossRefPubMedGoogle Scholar
- 2.del Portillo HA, Fernandez-Becerra C, Bowman S, Oliver K, Preuss M, Sanchez CP, Schneider NK, Villalobos JM, Rajandream MA, Harris D: A superfamily of variant genes encoded in the subtelomeric region of Plasmodium vivax. Nature. 2001, 410 (6830): 839-842. 10.1038/35071118.CrossRefPubMedGoogle Scholar
- 4.Cunningham D, Fonager J, Jarra W, Carret C, Preiser P, Langhorne J: Rapid changes in transcription profiles of the Plasmodium yoelii yir multigene family in clonal populations: lack of epigenetic memory?. PLoS One. 2009, 4 (1): e4285-10.1371/journal.pone.0004285.PubMedCentralCrossRefPubMedGoogle Scholar
- 9.Marsh K, Otoo L, Hayes RJ, Carson DC, Greenwood BM: Antibodies to blood stage antigens of Plasmodium falciparum in rural Gambians and their relation to protection against infection. Transactions of the Royal Society of Tropical Medicine and Hygiene. 1989, 83 (3): 293-303. 10.1016/0035-9203(89)90478-1.CrossRefPubMedGoogle Scholar
- 10.Fischer K, Chavchich M, Huestis R, Wilson DW, Kemp DJ, Saul A: Ten families of variant genes encoded in subtelomeric regions of multiple chromosomes of Plasmodium chabaudi, a malaria species that undergoes antigenic variation in the laboratory mouse. Mol Microbiol. 2003, 48 (5): 1209-1223. 10.1046/j.1365-2958.2003.03491.x.CrossRefPubMedGoogle Scholar
- 13.Bachmann A, Predehl S, May J, Harder S, Burchard GD, Gilberger TW, Tannich E, Bruchhaus I: Highly co-ordinated var gene expression and switching in clinical Plasmodium falciparum isolates from non-immune malaria patients. Cellular Microbiology. 2011, 13 (9): 1397-409. 10.1111/j.1462-5822.2011.01629.x.CrossRefPubMedGoogle Scholar
- 16.Petter M, Haeggstrom M, Khattab A, Fernandez V, Klinkert MQ, Wahlgren M: Variant proteins of the Plasmodium falciparum RIFIN family show distinct subcellular localization and developmental expression patterns. Mol Biochem Parasitol. 2007, 156 (1): 51-61. 10.1016/j.molbiopara.2007.07.011.CrossRefPubMedGoogle Scholar
- 22.Merino EF, Fernandez-Becerra C, Durham AM, Ferreira JE, Tumilasci VF, d'Arc-Neves J, da Silva-Nunes M, Ferreira MU, Wickramarachchi T, Udagama-Randeniya P: Multi-character population study of the vir subtelomeric multigene superfamily of Plasmodium vivax, a major human malaria parasite. Mol Biochem Parasitol. 2006, 149 (1): 10-16. 10.1016/j.molbiopara.2006.04.002.CrossRefPubMedGoogle Scholar
- 23.Fonager J, Cunningham D, Jarra W, Koernig S, Henneman AA, Langhorne J, Preiser P: Transcription and alternative splicing in the yir multigene family of the malaria parasite Plasmodium y. yoelii: identification of motifs suggesting epigenetic and post-transcriptional control of RNA expression. Molecular and Biochemical Parasitology. 2007, 156 (1): 1-11. 10.1016/j.molbiopara.2007.06.006.CrossRefPubMedGoogle Scholar
- 24.Bozdech Z, Mok S, Hu G, Imwong M, Jaidee A, Russell B, Ginsburg H, Nosten F, Day NP, White NJ: The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc Natl Acad Sci USA. 2008, 105 (42): 16290-16295. 10.1073/pnas.0807404105.PubMedCentralCrossRefPubMedGoogle Scholar
- 33.The Plasmodium chabaudi homepage in GeneDB. [http://www.genedb.org/Homepage/Pchabaudi]
- 34.Hall N, Karras M, Raine JD, Carlton JM, Kooij TWA, Berriman M, Florens L, Janssen CS, Pain A, Christophides GK: A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005, 307 (5706): 82-86. 10.1126/science.1103717.CrossRefPubMedGoogle Scholar
- 35.Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K: The Pfam protein families database. Nucleic Acids Res. 2010, D211-D222. 38 DatabaseGoogle Scholar
- 37.Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 1994, 2: 28-36.Google Scholar
- 43.Guindon S, Lethiec F, Duroux P, Gascuel O: PHYML Online-a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 2005, W557-W559. 33 Web ServerGoogle Scholar
- 48.Milne I, Lindner D, Bayer M, Husmeier D, McGuire G, Marshall DF, Wright F: TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics. 2009, 25 (1): 126-127. 10.1093/bioinformatics/btn575.PubMedCentralCrossRefPubMedGoogle Scholar
- 49.Abhiman S, Sonnhammer EL: FunShift: a database of function shift analysis on protein subfamilies. Nucleic Acids Res. 2005, D197-D200. 33 DatabaseGoogle Scholar
- 54.Kyes S: Reliable RNA preparation for Plasmodium falciparum. Methods in Malaria Research. Edited by: Ljungström I, Perlmann H, Schlichtherle M, Artur Scherf A, Wahlgren M. 2004, Manassas, Virginia, 175-178. FourthGoogle Scholar
- 57.Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Bohme U, Lemieux J, Barrell B, Pain A, Berriman M: New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol Microbiol. 2010, 76 (1): 12-24. 10.1111/j.1365-2958.2009.07026.x.PubMedCentralCrossRefPubMedGoogle Scholar
- 60.Kolmogorov A: Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari. 1933, 4: 83-91.Google Scholar
- 63.Corredor V, Meyer EV, Lapp S, Corredor-Medina C, Huber CS, Evans AG, Barnwell JW, Galinski MR: A SICAvar switching event in Plasmodium knowlesi is associated with the DNA rearrangement of conserved 3' non-coding sequences. Mol Biochem Parasitol. 2004, 138 (1): 37-49. 10.1016/j.molbiopara.2004.05.017.CrossRefPubMedGoogle Scholar
- 64.Frank M, Kirkman L, Costantini D, Sanyal S, Lavazec C, Templeton TJ, Deitsch KW: Frequent recombination events generate diversity within the multi-copy variant antigen gene families of Plasmodium falciparum. Int J Parasitol. 2008, 38 (10): 1099-1109. 10.1016/j.ijpara.2008.01.010.PubMedCentralCrossRefPubMedGoogle Scholar
- 66.Piel W, Donoghue MJ, Sanderson MJ: TreeBASE: a database of phylogenetic knowledge. To the interoperable "Catalog of Life" with partners Species 2000 Asia Oceanea Research Report from the National Institute for Environmental Studies No 171. Edited by: Shimura J, Wilson KL, Gordon D. 2002, Tsukuba, Japan, 41-47.Google Scholar
- 67.Sanderson MJ, Donoghue MJ, Piel W, Eriksson T: TreeBASE: a prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. American Journal of Botany. 1994, 81 (6): 183-Google Scholar
- 70.Sponaas AM, Do Rosario APF, Voisine C, Mastelic B, Thompson J, Koernig S, Jarra W, Renia L, Mauduit M, Potocnik AJ: Migrating monocytes recruited to the spleen play an important role in control of blood stage malaria. Blood. 2009, 114 (27): 5522-5531. 10.1182/blood-2009-04-217489.CrossRefPubMedGoogle Scholar
- 72.Su XZ, Heatwole VM, Wertheimer SP, Guinet F, Herrfeldt JA, Peterson DS, Ravetch JA, Wellems TE: The large diverse gene family var encodes proteins involved in cytoadherence and antigenic variation of Plasmodium falciparum-infected erythrocytes. Cell. 1995, 82 (1): 89-100. 10.1016/0092-8674(95)90055-1.CrossRefPubMedGoogle Scholar
- 75.Fernandez-Becerra C, Pein O, de Oliveira TR, Yamamoto MM, Cassola AC, Rocha C, Soares IS, de Braganca Pereira CA, del Portillo HA: Variant proteins of Plasmodium vivax are not clonally expressed in natural infections. Mol Microbiol. 2005, 58 (3): 648-658. 10.1111/j.1365-2958.2005.04850.x.CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.